Introduction¶
Chronic Kidney disease is the gradual loss of function of the kidney with no symptoms being manifested. 1 It's difficult to know the burden of the disease since they are no accurate diagnostic tests according to research done here. It could be characterized by uremic frost; however, careful diagnosis of the condition should be followed such as testing kidney function URI scan dripstick test for example the specific gravity -- low values(1.01 - 1.010) could mean that the patient has kidney damage, observation of the urine using microscopy and identification of casts) and other tests can help make a proper diagnosis.
In this notebook, we'll use data with 25 features that could be indicative of chronic kidney disease to see if predictive modelling could help us figure out which patients have chronic kidney disease. You can read more about the dataset using this link. Let's proceed to exploratory data analysis.
I first import all the packages that could be useful in wrangling, visualization and statistical modelling. I apologise if there's a package here that I have imported but I haven't used it. It may have slipped my mind for some reason.
import numpy as np # numeric processing
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from sklearn.feature_extraction import DictVectorizer
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split
from IPython.display import HTML
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import f1_score, classification_report
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import StandardScaler
from functools import *
import ydata_profiling
import matplotlib.pyplot as plt
from sklearn.pipeline import Pipeline
Turicreate is a machine learning library by apple. There's some functionality that I was interested in that I wanted to try. In future, I may add it to the notebook. If you want more information about the library. Find for information here.
#!pip install turicreate -q
Loading data and Exploratory data analysis¶
In this analysis, we'll do predictive modelling in hopes of finding a model which will be able to classify the patients appropriately.
Download the dataset from here and load it into the notebook.
!chmod a+x get_data.sh
!./get_data.sh
Downloading dataset from kaggle This may take a few minutes... Link: https://www.kaggle.com/mansoordaku/ckdisease Attempt 1 of 5... Warning: Looks like you're using an outdated API Version, please consider updating (server 1.7.4.2 / client 1.6.17) Dataset URL: https://www.kaggle.com/datasets/mansoordaku/ckdisease License(s): unknown Downloading ckdisease.zip to /home/stormbird/Desktop/chronic-kidney-disease-kaggle 0%| | 0.00/9.51k [00:00<?, ?B/s] 100%|██████████████████████████████████████| 9.51k/9.51k [00:00<00:00, 10.8MB/s] Dataset downloaded successfully. Archive: ckdisease.zip inflating: kidney_disease.csv Dataset downloaded successfully and moved to data/input folder kidney_disease.csv
# load the dataset with pandas read_csv function
df = pd.read_csv('data/input/kidney_disease.csv', index_col="id")
# give the dtypes of the columns if the data was squeeky clean
dtypes = {
'id' : np.int32,
'age' : np.int32,
'bp' : np.float32,
'sg' : object, # category
'al' : object, # category # mistake
'su' : object, #category # mistake
'rbc' : object, # category
'pc' : object, # category
'pcc' : object, # category
'ba' : object, # category
'bgr' : np.float32,
'bu' : np.int32,
'sc' : np.float32,
'sod': np.int32,
'pot' : np.float32,
'hemo' : np.float32,
'pcv' : np.int32,
'wc' : np.int32,
'rc' : np.int32,
'htn' : object,
'dm' : object,
'cad' : object,
'appet': object,
'pe' : object,
'ane' : object,
'class': object}
# another way of reading in the datasets especially very big files like 1GB big
# df2 = dd.read_csv('../input/kidney_disease.csv', dtype=dtypes)
# id 400 non-null int64
# age 391 non-null float64
# bp 388 non-null float64
# sg 353 non-null float64
# al 354 non-null float64
# su 351 non-null float64
# rbc 248 non-null object
# pc 335 non-null object
# pcc 396 non-null object
# ba 396 non-null object
# bgr 356 non-null float64
# bu 381 non-null float64
# sc 383 non-null float64
# sod 313 non-null float64
# pot 312 non-null float64
# hemo 348 non-null float64
# pcv 330 non-null object
# wc 295 non-null object
# rc 270 non-null object
# htn 398 non-null object
# dm 398 non-null object
# cad 398 non-null object
# appet 399 non-null object
# pe 399 non-null object
# ane 399 non-null object
# classification 400 non-null object
# see the first couple of observations and transpose 10 observations
# think of it as rolling over your dataset
df.head(10).transpose()
| id | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|---|---|---|---|---|---|---|---|---|---|---|
| age | 48.0 | 7.0 | 62.0 | 48.0 | 51.0 | 60.0 | 68.0 | 24.0 | 52.0 | 53.0 |
| bp | 80.0 | 50.0 | 80.0 | 70.0 | 80.0 | 90.0 | 70.0 | NaN | 100.0 | 90.0 |
| sg | 1.02 | 1.02 | 1.01 | 1.005 | 1.01 | 1.015 | 1.01 | 1.015 | 1.015 | 1.02 |
| al | 1.0 | 4.0 | 2.0 | 4.0 | 2.0 | 3.0 | 0.0 | 2.0 | 3.0 | 2.0 |
| su | 0.0 | 0.0 | 3.0 | 0.0 | 0.0 | 0.0 | 0.0 | 4.0 | 0.0 | 0.0 |
| rbc | NaN | NaN | normal | normal | normal | NaN | NaN | normal | normal | abnormal |
| pc | normal | normal | normal | abnormal | normal | NaN | normal | abnormal | abnormal | abnormal |
| pcc | notpresent | notpresent | notpresent | present | notpresent | notpresent | notpresent | notpresent | present | present |
| ba | notpresent | notpresent | notpresent | notpresent | notpresent | notpresent | notpresent | notpresent | notpresent | notpresent |
| bgr | 121.0 | NaN | 423.0 | 117.0 | 106.0 | 74.0 | 100.0 | 410.0 | 138.0 | 70.0 |
| bu | 36.0 | 18.0 | 53.0 | 56.0 | 26.0 | 25.0 | 54.0 | 31.0 | 60.0 | 107.0 |
| sc | 1.2 | 0.8 | 1.8 | 3.8 | 1.4 | 1.1 | 24.0 | 1.1 | 1.9 | 7.2 |
| sod | NaN | NaN | NaN | 111.0 | NaN | 142.0 | 104.0 | NaN | NaN | 114.0 |
| pot | NaN | NaN | NaN | 2.5 | NaN | 3.2 | 4.0 | NaN | NaN | 3.7 |
| hemo | 15.4 | 11.3 | 9.6 | 11.2 | 11.6 | 12.2 | 12.4 | 12.4 | 10.8 | 9.5 |
| pcv | 44 | 38 | 31 | 32 | 35 | 39 | 36 | 44 | 33 | 29 |
| wc | 7800 | 6000 | 7500 | 6700 | 7300 | 7800 | NaN | 6900 | 9600 | 12100 |
| rc | 5.2 | NaN | NaN | 3.9 | 4.6 | 4.4 | NaN | 5 | 4.0 | 3.7 |
| htn | yes | no | no | yes | no | yes | no | no | yes | yes |
| dm | yes | no | yes | no | no | yes | no | yes | yes | yes |
| cad | no | no | no | no | no | no | no | no | no | no |
| appet | good | good | poor | poor | good | good | good | good | good | poor |
| pe | no | no | no | yes | no | yes | no | yes | no | no |
| ane | no | no | yes | yes | no | no | no | no | yes | yes |
| classification | ckd | ckd | ckd | ckd | ckd | ckd | ckd | ckd | ckd | ckd |
# see the column names
df.columns
Index(['age', 'bp', 'sg', 'al', 'su', 'rbc', 'pc', 'pcc', 'ba', 'bgr', 'bu',
'sc', 'sod', 'pot', 'hemo', 'pcv', 'wc', 'rc', 'htn', 'dm', 'cad',
'appet', 'pe', 'ane', 'classification'],
dtype='object')
# see a concise summary of the dataset
df.info()
<class 'pandas.core.frame.DataFrame'> Index: 400 entries, 0 to 399 Data columns (total 25 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 age 391 non-null float64 1 bp 388 non-null float64 2 sg 353 non-null float64 3 al 354 non-null float64 4 su 351 non-null float64 5 rbc 248 non-null object 6 pc 335 non-null object 7 pcc 396 non-null object 8 ba 396 non-null object 9 bgr 356 non-null float64 10 bu 381 non-null float64 11 sc 383 non-null float64 12 sod 313 non-null float64 13 pot 312 non-null float64 14 hemo 348 non-null float64 15 pcv 330 non-null object 16 wc 295 non-null object 17 rc 270 non-null object 18 htn 398 non-null object 19 dm 398 non-null object 20 cad 398 non-null object 21 appet 399 non-null object 22 pe 399 non-null object 23 ane 399 non-null object 24 classification 400 non-null object dtypes: float64(11), object(14) memory usage: 81.2+ KB
26 columns and a variable number of observations per feature/variable
400 rows for each id - there could be missing data among the rows of the variable
# display summary statistics of each column
# this helps me confirm my assertion on missing data
df.describe(include="all").transpose()
| count | unique | top | freq | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| age | 391.0 | NaN | NaN | NaN | 51.483376 | 17.169714 | 2.0 | 42.0 | 55.0 | 64.5 | 90.0 |
| bp | 388.0 | NaN | NaN | NaN | 76.469072 | 13.683637 | 50.0 | 70.0 | 80.0 | 80.0 | 180.0 |
| sg | 353.0 | NaN | NaN | NaN | 1.017408 | 0.005717 | 1.005 | 1.01 | 1.02 | 1.02 | 1.025 |
| al | 354.0 | NaN | NaN | NaN | 1.016949 | 1.352679 | 0.0 | 0.0 | 0.0 | 2.0 | 5.0 |
| su | 351.0 | NaN | NaN | NaN | 0.450142 | 1.099191 | 0.0 | 0.0 | 0.0 | 0.0 | 5.0 |
| rbc | 248 | 2 | normal | 201 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| pc | 335 | 2 | normal | 259 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| pcc | 396 | 2 | notpresent | 354 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| ba | 396 | 2 | notpresent | 374 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| bgr | 356.0 | NaN | NaN | NaN | 148.036517 | 79.281714 | 22.0 | 99.0 | 121.0 | 163.0 | 490.0 |
| bu | 381.0 | NaN | NaN | NaN | 57.425722 | 50.503006 | 1.5 | 27.0 | 42.0 | 66.0 | 391.0 |
| sc | 383.0 | NaN | NaN | NaN | 3.072454 | 5.741126 | 0.4 | 0.9 | 1.3 | 2.8 | 76.0 |
| sod | 313.0 | NaN | NaN | NaN | 137.528754 | 10.408752 | 4.5 | 135.0 | 138.0 | 142.0 | 163.0 |
| pot | 312.0 | NaN | NaN | NaN | 4.627244 | 3.193904 | 2.5 | 3.8 | 4.4 | 4.9 | 47.0 |
| hemo | 348.0 | NaN | NaN | NaN | 12.526437 | 2.912587 | 3.1 | 10.3 | 12.65 | 15.0 | 17.8 |
| pcv | 330 | 44 | 41 | 21 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| wc | 295 | 92 | 9800 | 11 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| rc | 270 | 49 | 5.2 | 18 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| htn | 398 | 2 | no | 251 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| dm | 398 | 5 | no | 258 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| cad | 398 | 3 | no | 362 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| appet | 399 | 2 | good | 317 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| pe | 399 | 2 | no | 323 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| ane | 399 | 2 | no | 339 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| classification | 400 | 3 | ckd | 248 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
# Looking at variables interractively
profile = ydata_profiling.ProfileReport(df)
profile
Summarize dataset: 0%| | 0/5 [00:00<?, ?it/s]
Generate report structure: 0%| | 0/1 [00:00<?, ?it/s]
Render HTML: 0%| | 0/1 [00:00<?, ?it/s]
The good news is that we can work with the current state of the columns since they have been labelled consistently. Bad news is that we have a lot of missing data in this dataset. Let's proceed and find out the number of missing values per column and if the classes are balanced or unbalanced. The profiler did the work already but sometimes it is good to confirm it your own way.
# looking for the number of missing observations
# In the code below a boolean is being tried on each observation asking if the observation is missing or not
# then add all instances of NaN(Not a number)
missing_values = df.isnull().sum()
# calculating the percentage of missing values in the dataframe
# simply taking the sum of the values we got above dividing by the no of observations in the df
# you could use len(df) instead df.index.size
missing_count_pct = ((missing_values / df.index.size) * 100)
# see how many observations are missing
print(missing_count_pct)
age 2.25 bp 3.00 sg 11.75 al 11.50 su 12.25 rbc 38.00 pc 16.25 pcc 1.00 ba 1.00 bgr 11.00 bu 4.75 sc 4.25 sod 21.75 pot 22.00 hemo 13.00 pcv 17.50 wc 26.25 rc 32.50 htn 0.50 dm 0.50 cad 0.50 appet 0.25 pe 0.25 ane 0.25 classification 0.00 dtype: float64
# take the missing count percentage and use boolean mask to filter out columns
# whose observation threshold is greater than 25 percent
# 25 is a random number chosen based how much missing data is in the dataset
# 25-50 is normally a red flag since most data is missing
columns_to_drop = missing_count_pct[missing_count_pct > 25].index
# remove columns that meet that threshold and save result in column df_dropped
df_dropped = df.drop(columns_to_drop, axis=1)
# number of columns remaining after filtering
df.columns.size - df_dropped.columns.size
# only three columns are lost
3
I really hate losing a few columns. I won't throw everything away. But, I will keep these columns while we are doing predictive modelling use the different variants of the datasets and see if there will be any boost in results. In the meantime, let's look at the code book to come up with a hypothesis to find out which columns are the most important and converting the types of each column to another format that will speed up computation during training.
# look at the code book on kaggle and write which columns could be useful here
According to the original site where we found data here.. I found the identity of the columns rather what the columns mean. I'll put a star on the columns i think are important from my background in medical laboratory science. Then the second run through this notebook we could explore only the columns i think are important and lastly use a technique called singular value decomposition to figure out which ones are the most important.
age - age
bp - blood pressure *
sg - specific gravity *
al - albumin *
su - sugar *
rbc - red blood cells *
pc - pus cell*
pcc - pus cell clumps *
ba - bacteria*
bgr - blood glucose random
bu - blood urea*
sc - serum creatinine
sod - sodium
pot - potassium
hemo - hemoglobin*
pcv - packed cell volume
wc - white blood cell count*
rc - red blood cell count*
htn - hypertension*
dm - diabetes mellitus*
cad - coronary artery disease*
appet - appetite*
pe - pedal edema*
ane - anemia*
class - class*
# checking the types of the column to figure out the best next steps of conversion of data types
df.dtypes
age float64 bp float64 sg float64 al float64 su float64 rbc object pc object pcc object ba object bgr float64 bu float64 sc float64 sod float64 pot float64 hemo float64 pcv object wc object rc object htn object dm object cad object appet object pe object ane object classification object dtype: object
Review the columns from the original codebook to determine the datatypes then make a schema which we can follow as i import the dataset
# fix the columns to be of the categorical type
# if the value is missing replace the NA with the word miss
constant_imputer = SimpleImputer(strategy="constant", fill_value = "miss")
# apply it to categorical columns
df[["rbc"]] = constant_imputer.fit_transform(df[["rbc"]])
df[["pcc"]] = constant_imputer.fit_transform(df[["pcc"]])
# converting the types to be categorical
# Go ahead and use a function here
df['rbc'] = df['rbc'].astype("category")
df['pc'] = df['pc'].astype("category")
df["pcc"] = df['pcc'].astype("category")
df['ba'] = df['ba'].astype("category")
df['appet'] = df['appet'].astype("category")
df['pe'] = df['pe'].astype("category")
df['ane'] = df['ane'].astype("category")
df['classification'] = df['classification'].astype("category")
df['htn'] = df['htn'].astype("category")
df['dm'] = df['dm'].astype("category")
df['cad'] = df['cad'].astype("category")
# confirm the dtypes now
df.dtypes
age float64 bp float64 sg float64 al float64 su float64 rbc category pc category pcc category ba category bgr float64 bu float64 sc float64 sod float64 pot float64 hemo float64 pcv object wc object rc object htn category dm category cad category appet category pe category ane category classification category dtype: object
# seeing the columns in list form thinking mode
df.columns
Index(['age', 'bp', 'sg', 'al', 'su', 'rbc', 'pc', 'pcc', 'ba', 'bgr', 'bu',
'sc', 'sod', 'pot', 'hemo', 'pcv', 'wc', 'rc', 'htn', 'dm', 'cad',
'appet', 'pe', 'ane', 'classification'],
dtype='object')
# make a copy of the whole dataset
df_copy = df.copy()
# remove the target column for the other uses in the next steps
#df = df.drop("classification", axis = 1)
# using a boolean to figure out which columns are of type object and numeric to do other preprocessing
# in the workflow
object_columns = df.dtypes == "object"
numeric_columns = df.dtypes == "float64"
category_columns = df.dtypes == "category"
# use regular expressions to fix it and this is supposed to be one of the first steps after df.dtypes command. If it's categorical
# you can replace it with anything you want here I use -999 to replace the data entries the tab character to flag them as outliers.
# I change the dtypes so to 32 bit to save memory
df['pcv'] = df['pcv'].replace("\t?",-999).fillna(0).astype("int32") # use str.replace on column to something meaningful
df['wc'] = df['wc'].replace("\t?", -999).fillna(0).astype("int32") # use str.replace on column to something meaningful
df['rc'] = df['rc'].replace("\t?", -999).fillna(0).astype("float32") # use str.replace on column to something meaningful
# exploring another imputation strategy that uses the median
# mean_imputer = SimpleImputer(strategy="median")
# df["pcv"] = mean_imputer.fit_transform(df["pcv"])
# df["wc"] = mean_imputer.fit_transform(df["wc"])
# df["rc"] = mean_imputer.fit_transform(df["rc"])
# write code to extract columns of the type object and numeric
# Make a boolean mask for categorical columns
cat_mask_obj = (df.dtypes == "object") | (df.dtypes == "category")
# Get list of categorical column names
cat_mask_object = df.columns[cat_mask_obj].tolist()
# now for numerical columns
# anything that was parsed as float64 is numeric: make a boolean mask for that
cat_mask_numeric = (df.dtypes == "float64")
cat_mask_numeric = df.columns[cat_mask_numeric].tolist()
# see the result in a combined list: to the left categorical and the right we have numeric columns
print(cat_mask_object, "\n", cat_mask_numeric)
['rbc', 'pc', 'pcc', 'ba', 'htn', 'dm', 'cad', 'appet', 'pe', 'ane', 'classification'] ['age', 'bp', 'sg', 'al', 'su', 'bgr', 'bu', 'sc', 'sod', 'pot', 'hemo']
# convert all instances of the float 64 to float 32 to speed up computation in the subsequent steps
# remove all the missing values and make sure that they are all numeric
numeric_columns_float32 = df[cat_mask_numeric].astype("float32").fillna(0)
#it's worked
numeric_columns_float32.dtypes
age float32 bp float32 sg float32 al float32 su float32 bgr float32 bu float32 sc float32 sod float32 pot float32 hemo float32 dtype: object
# Task: split the category columns and object columns to the right type
# you can import the dataset to have the right types too upon import
# you can do this the next time you have time to continue working on this add it as comment though
They are some columns that are wrongly parsed due to the NAs. They include: pcv(numerical int32), rc(numerical int32). I can either interpolate the missing columns depending how they'll look like in a plot or use mean/median to find the value.
# it makes sense that they are some individuals who were not sampled therefore filling the whole dataset with NAs makes sense
# these two columns have data entry problems
# use regular expressions to fix it and this is supposed to be one of the first steps after df.dtypes command. If it's categorical
# you can replace it with anything you want
#df['pcv'] = df['pcv'].fillna(0, inplace = True)
#df['rc'] = df['rc'].fillna(0, inplace = True)
#df['wc'] = df['wc'].fillna(0, inplace = True)
# finding the number of null or NA values in the columns
pd.isnull(df).sum()
age 9 bp 12 sg 47 al 46 su 49 rbc 0 pc 65 pcc 0 ba 4 bgr 44 bu 19 sc 17 sod 87 pot 88 hemo 52 pcv 0 wc 0 rc 0 htn 2 dm 2 cad 2 appet 1 pe 1 ane 1 classification 0 dtype: int64
# checking the dtypes once more
df.dtypes
age float64 bp float64 sg float64 al float64 su float64 rbc category pc category pcc category ba category bgr float64 bu float64 sc float64 sod float64 pot float64 hemo float64 pcv int32 wc int32 rc float32 htn category dm category cad category appet category pe category ane category classification category dtype: object
# concatentate the numeric columns with the category columns to build the full dataset and then X and Y
# remove
df[cat_mask_object] = constant_imputer.fit_transform(df[cat_mask_object])
# check for missing values
print(df[cat_mask_object].isnull().sum())
print("*" * 100)
print(numeric_columns_float32.isnull().sum())
rbc 0 pc 0 pcc 0 ba 0 htn 0 dm 0 cad 0 appet 0 pe 0 ane 0 classification 0 dtype: int64 **************************************************************************************************** age 0 bp 0 sg 0 al 0 su 0 bgr 0 bu 0 sc 0 sod 0 pot 0 hemo 0 dtype: int64
# bring the columns together with pd.concat
df_clean = pd.concat([numeric_columns_float32, df[cat_mask_object]], axis = 1)
# check the shape of the columns
df_clean.shape
(400, 22)
# just see the first 10 observations
df_clean.head(10)
# HTML(df_clean.to_html()) see the whole dataframe in HTML format
| age | bp | sg | al | su | bgr | bu | sc | sod | pot | ... | pc | pcc | ba | htn | dm | cad | appet | pe | ane | classification | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | |||||||||||||||||||||
| 0 | 48.0 | 80.0 | 1.020 | 1.0 | 0.0 | 121.0 | 36.0 | 1.2 | 0.0 | 0.0 | ... | normal | notpresent | notpresent | yes | yes | no | good | no | no | ckd |
| 1 | 7.0 | 50.0 | 1.020 | 4.0 | 0.0 | 0.0 | 18.0 | 0.8 | 0.0 | 0.0 | ... | normal | notpresent | notpresent | no | no | no | good | no | no | ckd |
| 2 | 62.0 | 80.0 | 1.010 | 2.0 | 3.0 | 423.0 | 53.0 | 1.8 | 0.0 | 0.0 | ... | normal | notpresent | notpresent | no | yes | no | poor | no | yes | ckd |
| 3 | 48.0 | 70.0 | 1.005 | 4.0 | 0.0 | 117.0 | 56.0 | 3.8 | 111.0 | 2.5 | ... | abnormal | present | notpresent | yes | no | no | poor | yes | yes | ckd |
| 4 | 51.0 | 80.0 | 1.010 | 2.0 | 0.0 | 106.0 | 26.0 | 1.4 | 0.0 | 0.0 | ... | normal | notpresent | notpresent | no | no | no | good | no | no | ckd |
| 5 | 60.0 | 90.0 | 1.015 | 3.0 | 0.0 | 74.0 | 25.0 | 1.1 | 142.0 | 3.2 | ... | miss | notpresent | notpresent | yes | yes | no | good | yes | no | ckd |
| 6 | 68.0 | 70.0 | 1.010 | 0.0 | 0.0 | 100.0 | 54.0 | 24.0 | 104.0 | 4.0 | ... | normal | notpresent | notpresent | no | no | no | good | no | no | ckd |
| 7 | 24.0 | 0.0 | 1.015 | 2.0 | 4.0 | 410.0 | 31.0 | 1.1 | 0.0 | 0.0 | ... | abnormal | notpresent | notpresent | no | yes | no | good | yes | no | ckd |
| 8 | 52.0 | 100.0 | 1.015 | 3.0 | 0.0 | 138.0 | 60.0 | 1.9 | 0.0 | 0.0 | ... | abnormal | present | notpresent | yes | yes | no | good | no | yes | ckd |
| 9 | 53.0 | 90.0 | 1.020 | 2.0 | 0.0 | 70.0 | 107.0 | 7.2 | 114.0 | 3.7 | ... | abnormal | present | notpresent | yes | yes | no | poor | no | yes | ckd |
10 rows × 22 columns
# now see the bottom 10
df_clean.tail(10)
| age | bp | sg | al | su | bgr | bu | sc | sod | pot | ... | pc | pcc | ba | htn | dm | cad | appet | pe | ane | classification | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | |||||||||||||||||||||
| 390 | 52.0 | 80.0 | 1.025 | 0.0 | 0.0 | 99.0 | 25.0 | 0.8 | 135.0 | 3.7 | ... | normal | notpresent | notpresent | no | no | no | good | no | no | notckd |
| 391 | 36.0 | 80.0 | 1.025 | 0.0 | 0.0 | 85.0 | 16.0 | 1.1 | 142.0 | 4.1 | ... | normal | notpresent | notpresent | no | no | no | good | no | no | notckd |
| 392 | 57.0 | 80.0 | 1.020 | 0.0 | 0.0 | 133.0 | 48.0 | 1.2 | 147.0 | 4.3 | ... | normal | notpresent | notpresent | no | no | no | good | no | no | notckd |
| 393 | 43.0 | 60.0 | 1.025 | 0.0 | 0.0 | 117.0 | 45.0 | 0.7 | 141.0 | 4.4 | ... | normal | notpresent | notpresent | no | no | no | good | no | no | notckd |
| 394 | 50.0 | 80.0 | 1.020 | 0.0 | 0.0 | 137.0 | 46.0 | 0.8 | 139.0 | 5.0 | ... | normal | notpresent | notpresent | no | no | no | good | no | no | notckd |
| 395 | 55.0 | 80.0 | 1.020 | 0.0 | 0.0 | 140.0 | 49.0 | 0.5 | 150.0 | 4.9 | ... | normal | notpresent | notpresent | no | no | no | good | no | no | notckd |
| 396 | 42.0 | 70.0 | 1.025 | 0.0 | 0.0 | 75.0 | 31.0 | 1.2 | 141.0 | 3.5 | ... | normal | notpresent | notpresent | no | no | no | good | no | no | notckd |
| 397 | 12.0 | 80.0 | 1.020 | 0.0 | 0.0 | 100.0 | 26.0 | 0.6 | 137.0 | 4.4 | ... | normal | notpresent | notpresent | no | no | no | good | no | no | notckd |
| 398 | 17.0 | 60.0 | 1.025 | 0.0 | 0.0 | 114.0 | 50.0 | 1.0 | 135.0 | 4.9 | ... | normal | notpresent | notpresent | no | no | no | good | no | no | notckd |
| 399 | 58.0 | 80.0 | 1.025 | 0.0 | 0.0 | 131.0 | 18.0 | 1.1 | 141.0 | 3.5 | ... | normal | notpresent | notpresent | no | no | no | good | no | no | notckd |
10 rows × 22 columns
HTML(df.to_html()) # just looking for something I may have missed in the pandas profiling
| age | bp | sg | al | su | rbc | pc | pcc | ba | bgr | bu | sc | sod | pot | hemo | pcv | wc | rc | htn | dm | cad | appet | pe | ane | classification | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | |||||||||||||||||||||||||
| 0 | 48.0 | 80.0 | 1.020 | 1.0 | 0.0 | miss | normal | notpresent | notpresent | 121.0 | 36.0 | 1.20 | NaN | NaN | 15.4 | 44 | 7800 | 5.2 | yes | yes | no | good | no | no | ckd |
| 1 | 7.0 | 50.0 | 1.020 | 4.0 | 0.0 | miss | normal | notpresent | notpresent | NaN | 18.0 | 0.80 | NaN | NaN | 11.3 | 38 | 6000 | 0.0 | no | no | no | good | no | no | ckd |
| 2 | 62.0 | 80.0 | 1.010 | 2.0 | 3.0 | normal | normal | notpresent | notpresent | 423.0 | 53.0 | 1.80 | NaN | NaN | 9.6 | 31 | 7500 | 0.0 | no | yes | no | poor | no | yes | ckd |
| 3 | 48.0 | 70.0 | 1.005 | 4.0 | 0.0 | normal | abnormal | present | notpresent | 117.0 | 56.0 | 3.80 | 111.0 | 2.5 | 11.2 | 32 | 6700 | 3.9 | yes | no | no | poor | yes | yes | ckd |
| 4 | 51.0 | 80.0 | 1.010 | 2.0 | 0.0 | normal | normal | notpresent | notpresent | 106.0 | 26.0 | 1.40 | NaN | NaN | 11.6 | 35 | 7300 | 4.6 | no | no | no | good | no | no | ckd |
| 5 | 60.0 | 90.0 | 1.015 | 3.0 | 0.0 | miss | miss | notpresent | notpresent | 74.0 | 25.0 | 1.10 | 142.0 | 3.2 | 12.2 | 39 | 7800 | 4.4 | yes | yes | no | good | yes | no | ckd |
| 6 | 68.0 | 70.0 | 1.010 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 100.0 | 54.0 | 24.00 | 104.0 | 4.0 | 12.4 | 36 | 0 | 0.0 | no | no | no | good | no | no | ckd |
| 7 | 24.0 | NaN | 1.015 | 2.0 | 4.0 | normal | abnormal | notpresent | notpresent | 410.0 | 31.0 | 1.10 | NaN | NaN | 12.4 | 44 | 6900 | 5.0 | no | yes | no | good | yes | no | ckd |
| 8 | 52.0 | 100.0 | 1.015 | 3.0 | 0.0 | normal | abnormal | present | notpresent | 138.0 | 60.0 | 1.90 | NaN | NaN | 10.8 | 33 | 9600 | 4.0 | yes | yes | no | good | no | yes | ckd |
| 9 | 53.0 | 90.0 | 1.020 | 2.0 | 0.0 | abnormal | abnormal | present | notpresent | 70.0 | 107.0 | 7.20 | 114.0 | 3.7 | 9.5 | 29 | 12100 | 3.7 | yes | yes | no | poor | no | yes | ckd |
| 10 | 50.0 | 60.0 | 1.010 | 2.0 | 4.0 | miss | abnormal | present | notpresent | 490.0 | 55.0 | 4.00 | NaN | NaN | 9.4 | 28 | 0 | 0.0 | yes | yes | no | good | no | yes | ckd |
| 11 | 63.0 | 70.0 | 1.010 | 3.0 | 0.0 | abnormal | abnormal | present | notpresent | 380.0 | 60.0 | 2.70 | 131.0 | 4.2 | 10.8 | 32 | 4500 | 3.8 | yes | yes | no | poor | yes | no | ckd |
| 12 | 68.0 | 70.0 | 1.015 | 3.0 | 1.0 | miss | normal | present | notpresent | 208.0 | 72.0 | 2.10 | 138.0 | 5.8 | 9.7 | 28 | 12200 | 3.4 | yes | yes | yes | poor | yes | no | ckd |
| 13 | 68.0 | 70.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 98.0 | 86.0 | 4.60 | 135.0 | 3.4 | 9.8 | 0 | 0 | 0.0 | yes | yes | yes | poor | yes | no | ckd |
| 14 | 68.0 | 80.0 | 1.010 | 3.0 | 2.0 | normal | abnormal | present | present | 157.0 | 90.0 | 4.10 | 130.0 | 6.4 | 5.6 | 16 | 11000 | 2.6 | yes | yes | yes | poor | yes | no | ckd |
| 15 | 40.0 | 80.0 | 1.015 | 3.0 | 0.0 | miss | normal | notpresent | notpresent | 76.0 | 162.0 | 9.60 | 141.0 | 4.9 | 7.6 | 24 | 3800 | 2.8 | yes | no | no | good | no | yes | ckd |
| 16 | 47.0 | 70.0 | 1.015 | 2.0 | 0.0 | miss | normal | notpresent | notpresent | 99.0 | 46.0 | 2.20 | 138.0 | 4.1 | 12.6 | 0 | 0 | 0.0 | no | no | no | good | no | no | ckd |
| 17 | 47.0 | 80.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 114.0 | 87.0 | 5.20 | 139.0 | 3.7 | 12.1 | 0 | 0 | 0.0 | yes | no | no | poor | no | no | ckd |
| 18 | 60.0 | 100.0 | 1.025 | 0.0 | 3.0 | miss | normal | notpresent | notpresent | 263.0 | 27.0 | 1.30 | 135.0 | 4.3 | 12.7 | 37 | 11400 | 4.3 | yes | yes | yes | good | no | no | ckd |
| 19 | 62.0 | 60.0 | 1.015 | 1.0 | 0.0 | miss | abnormal | present | notpresent | 100.0 | 31.0 | 1.60 | NaN | NaN | 10.3 | 30 | 5300 | 3.7 | yes | no | yes | good | no | no | ckd |
| 20 | 61.0 | 80.0 | 1.015 | 2.0 | 0.0 | abnormal | abnormal | notpresent | notpresent | 173.0 | 148.0 | 3.90 | 135.0 | 5.2 | 7.7 | 24 | 9200 | 3.2 | yes | yes | yes | poor | yes | yes | ckd |
| 21 | 60.0 | 90.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | NaN | 180.0 | 76.00 | 4.5 | NaN | 10.9 | 32 | 6200 | 3.6 | yes | yes | yes | good | no | no | ckd |
| 22 | 48.0 | 80.0 | 1.025 | 4.0 | 0.0 | normal | abnormal | notpresent | notpresent | 95.0 | 163.0 | 7.70 | 136.0 | 3.8 | 9.8 | 32 | 6900 | 3.4 | yes | no | no | good | no | yes | ckd |
| 23 | 21.0 | 70.0 | 1.010 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 0 | 0.0 | no | no | no | poor | no | yes | ckd |
| 24 | 42.0 | 100.0 | 1.015 | 4.0 | 0.0 | normal | abnormal | notpresent | present | NaN | 50.0 | 1.40 | 129.0 | 4.0 | 11.1 | 39 | 8300 | 4.6 | yes | no | no | poor | no | no | ckd |
| 25 | 61.0 | 60.0 | 1.025 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 108.0 | 75.0 | 1.90 | 141.0 | 5.2 | 9.9 | 29 | 8400 | 3.7 | yes | yes | no | good | no | yes | ckd |
| 26 | 75.0 | 80.0 | 1.015 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 156.0 | 45.0 | 2.40 | 140.0 | 3.4 | 11.6 | 35 | 10300 | 4.0 | yes | yes | no | poor | no | no | ckd |
| 27 | 69.0 | 70.0 | 1.010 | 3.0 | 4.0 | normal | abnormal | notpresent | notpresent | 264.0 | 87.0 | 2.70 | 130.0 | 4.0 | 12.5 | 37 | 9600 | 4.1 | yes | yes | yes | good | yes | no | ckd |
| 28 | 75.0 | 70.0 | NaN | 1.0 | 3.0 | miss | miss | notpresent | notpresent | 123.0 | 31.0 | 1.40 | NaN | NaN | NaN | 0 | 0 | 0.0 | no | yes | no | good | no | no | ckd |
| 29 | 68.0 | 70.0 | 1.005 | 1.0 | 0.0 | abnormal | abnormal | present | notpresent | NaN | 28.0 | 1.40 | NaN | NaN | 12.9 | 38 | 0 | 0.0 | no | no | yes | good | no | no | ckd |
| 30 | NaN | 70.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 93.0 | 155.0 | 7.30 | 132.0 | 4.9 | NaN | 0 | 0 | 0.0 | yes | yes | no | good | no | no | ckd |
| 31 | 73.0 | 90.0 | 1.015 | 3.0 | 0.0 | miss | abnormal | present | notpresent | 107.0 | 33.0 | 1.50 | 141.0 | 4.6 | 10.1 | 30 | 7800 | 4.0 | no | no | no | poor | no | no | ckd |
| 32 | 61.0 | 90.0 | 1.010 | 1.0 | 1.0 | miss | normal | notpresent | notpresent | 159.0 | 39.0 | 1.50 | 133.0 | 4.9 | 11.3 | 34 | 9600 | 4.0 | yes | yes | no | poor | no | no | ckd |
| 33 | 60.0 | 100.0 | 1.020 | 2.0 | 0.0 | abnormal | abnormal | notpresent | notpresent | 140.0 | 55.0 | 2.50 | NaN | NaN | 10.1 | 29 | 0 | 0.0 | yes | no | no | poor | no | no | ckd |
| 34 | 70.0 | 70.0 | 1.010 | 1.0 | 0.0 | normal | miss | present | present | 171.0 | 153.0 | 5.20 | NaN | NaN | NaN | 0 | 0 | 0.0 | no | yes | no | poor | no | no | ckd |
| 35 | 65.0 | 90.0 | 1.020 | 2.0 | 1.0 | abnormal | normal | notpresent | notpresent | 270.0 | 39.0 | 2.00 | NaN | NaN | 12.0 | 36 | 9800 | 4.9 | yes | yes | no | poor | no | yes | ckd |
| 36 | 76.0 | 70.0 | 1.015 | 1.0 | 0.0 | normal | normal | notpresent | notpresent | 92.0 | 29.0 | 1.80 | 133.0 | 3.9 | 10.3 | 32 | 0 | 0.0 | yes | no | no | good | no | no | ckd |
| 37 | 72.0 | 80.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 137.0 | 65.0 | 3.40 | 141.0 | 4.7 | 9.7 | 28 | 6900 | 2.5 | yes | yes | no | poor | no | yes | ckd\t |
| 38 | 69.0 | 80.0 | 1.020 | 3.0 | 0.0 | abnormal | normal | notpresent | notpresent | NaN | 103.0 | 4.10 | 132.0 | 5.9 | 12.5 | 0 | 0 | 0.0 | yes | no | no | good | no | no | ckd |
| 39 | 82.0 | 80.0 | 1.010 | 2.0 | 2.0 | normal | miss | notpresent | notpresent | 140.0 | 70.0 | 3.40 | 136.0 | 4.2 | 13.0 | 40 | 9800 | 4.2 | yes | yes | no | good | no | no | ckd |
| 40 | 46.0 | 90.0 | 1.010 | 2.0 | 0.0 | normal | abnormal | notpresent | notpresent | 99.0 | 80.0 | 2.10 | NaN | NaN | 11.1 | 32 | 9100 | 4.1 | yes | no | \tno | good | no | no | ckd |
| 41 | 45.0 | 70.0 | 1.010 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | NaN | 20.0 | 0.70 | NaN | NaN | NaN | 0 | 0 | 0.0 | no | no | no | good | yes | no | ckd |
| 42 | 47.0 | 100.0 | 1.010 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 204.0 | 29.0 | 1.00 | 139.0 | 4.2 | 9.7 | 33 | 9200 | 4.5 | yes | no | no | good | no | yes | ckd |
| 43 | 35.0 | 80.0 | 1.010 | 1.0 | 0.0 | abnormal | miss | notpresent | notpresent | 79.0 | 202.0 | 10.80 | 134.0 | 3.4 | 7.9 | 24 | 7900 | 3.1 | no | yes | no | good | no | no | ckd |
| 44 | 54.0 | 80.0 | 1.010 | 3.0 | 0.0 | abnormal | abnormal | notpresent | notpresent | 207.0 | 77.0 | 6.30 | 134.0 | 4.8 | 9.7 | 28 | 0 | 0.0 | yes | yes | no | poor | yes | no | ckd |
| 45 | 54.0 | 80.0 | 1.020 | 3.0 | 0.0 | miss | abnormal | notpresent | notpresent | 208.0 | 89.0 | 5.90 | 130.0 | 4.9 | 9.3 | 0 | 0 | 0.0 | yes | yes | no | poor | yes | no | ckd |
| 46 | 48.0 | 70.0 | 1.015 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 124.0 | 24.0 | 1.20 | 142.0 | 4.2 | 12.4 | 37 | 6400 | 4.7 | no | yes | no | good | no | no | ckd |
| 47 | 11.0 | 80.0 | 1.010 | 3.0 | 0.0 | miss | normal | notpresent | notpresent | NaN | 17.0 | 0.80 | NaN | NaN | 15.0 | 45 | 8600 | 0.0 | no | no | no | good | no | no | ckd |
| 48 | 73.0 | 70.0 | 1.005 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 70.0 | 32.0 | 0.90 | 125.0 | 4.0 | 10.0 | 29 | 18900 | 3.5 | yes | yes | no | good | yes | no | ckd |
| 49 | 60.0 | 70.0 | 1.010 | 2.0 | 0.0 | normal | abnormal | present | notpresent | 144.0 | 72.0 | 3.00 | NaN | NaN | 9.7 | 29 | 21600 | 3.5 | yes | yes | no | poor | no | yes | ckd |
| 50 | 53.0 | 60.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 91.0 | 114.0 | 3.25 | 142.0 | 4.3 | 8.6 | 28 | 11000 | 3.8 | yes | yes | no | poor | yes | yes | ckd |
| 51 | 54.0 | 100.0 | 1.015 | 3.0 | 0.0 | miss | normal | present | notpresent | 162.0 | 66.0 | 1.60 | 136.0 | 4.4 | 10.3 | 33 | 0 | 0.0 | yes | yes | no | poor | yes | no | ckd |
| 52 | 53.0 | 90.0 | 1.015 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | NaN | 38.0 | 2.20 | NaN | NaN | 10.9 | 34 | 4300 | 3.7 | no | no | no | poor | no | yes | ckd |
| 53 | 62.0 | 80.0 | 1.015 | 0.0 | 5.0 | miss | miss | notpresent | notpresent | 246.0 | 24.0 | 1.00 | NaN | NaN | 13.6 | 40 | 8500 | 4.7 | yes | yes | no | good | no | no | ckd |
| 54 | 63.0 | 80.0 | 1.010 | 2.0 | 2.0 | normal | miss | notpresent | notpresent | NaN | NaN | 3.40 | 136.0 | 4.2 | 13.0 | 40 | 9800 | 4.2 | yes | no | yes | good | no | no | ckd |
| 55 | 35.0 | 80.0 | 1.005 | 3.0 | 0.0 | abnormal | normal | notpresent | notpresent | NaN | NaN | NaN | NaN | NaN | 9.5 | 28 | 0 | 0.0 | no | no | no | good | yes | no | ckd |
| 56 | 76.0 | 70.0 | 1.015 | 3.0 | 4.0 | normal | abnormal | present | notpresent | NaN | 164.0 | 9.70 | 131.0 | 4.4 | 10.2 | 30 | 11300 | 3.4 | yes | yes | yes | poor | yes | no | ckd |
| 57 | 76.0 | 90.0 | NaN | NaN | NaN | miss | normal | notpresent | notpresent | 93.0 | 155.0 | 7.30 | 132.0 | 4.9 | NaN | 0 | 0 | 0.0 | yes | yes | yes | poor | no | no | ckd |
| 58 | 73.0 | 80.0 | 1.020 | 2.0 | 0.0 | abnormal | abnormal | notpresent | notpresent | 253.0 | 142.0 | 4.60 | 138.0 | 5.8 | 10.5 | 33 | 7200 | 4.3 | yes | yes | yes | good | no | no | ckd |
| 59 | 59.0 | 100.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | NaN | 96.0 | 6.40 | NaN | NaN | 6.6 | 0 | 0 | 0.0 | yes | yes | no | good | no | yes | ckd |
| 60 | 67.0 | 90.0 | 1.020 | 1.0 | 0.0 | miss | abnormal | present | notpresent | 141.0 | 66.0 | 3.20 | 138.0 | 6.6 | NaN | 0 | 0 | 0.0 | yes | no | no | good | no | no | ckd |
| 61 | 67.0 | 80.0 | 1.010 | 1.0 | 3.0 | normal | abnormal | notpresent | notpresent | 182.0 | 391.0 | 32.00 | 163.0 | 39.0 | NaN | 0 | 0 | 0.0 | no | no | no | good | yes | no | ckd |
| 62 | 15.0 | 60.0 | 1.020 | 3.0 | 0.0 | miss | normal | notpresent | notpresent | 86.0 | 15.0 | 0.60 | 138.0 | 4.0 | 11.0 | 33 | 7700 | 3.8 | yes | yes | no | good | no | no | ckd |
| 63 | 46.0 | 70.0 | 1.015 | 1.0 | 0.0 | abnormal | normal | notpresent | notpresent | 150.0 | 111.0 | 6.10 | 131.0 | 3.7 | 7.5 | 27 | 0 | 0.0 | no | no | no | good | no | yes | ckd |
| 64 | 55.0 | 80.0 | 1.010 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 146.0 | NaN | NaN | NaN | NaN | 9.8 | 0 | 0 | 0.0 | no | no | \tno | good | no | no | ckd |
| 65 | 44.0 | 90.0 | 1.010 | 1.0 | 0.0 | miss | normal | notpresent | notpresent | NaN | 20.0 | 1.10 | NaN | NaN | 15.0 | 48 | 0 | 0.0 | no | \tno | no | good | no | no | ckd |
| 66 | 67.0 | 70.0 | 1.020 | 2.0 | 0.0 | abnormal | normal | notpresent | notpresent | 150.0 | 55.0 | 1.60 | 131.0 | 4.8 | NaN | -999 | 0 | 0.0 | yes | yes | no | good | yes | no | ckd |
| 67 | 45.0 | 80.0 | 1.020 | 3.0 | 0.0 | normal | abnormal | notpresent | notpresent | 425.0 | NaN | NaN | NaN | NaN | NaN | 0 | 0 | 0.0 | no | no | no | poor | no | no | ckd |
| 68 | 65.0 | 70.0 | 1.010 | 2.0 | 0.0 | miss | normal | present | notpresent | 112.0 | 73.0 | 3.30 | NaN | NaN | 10.9 | 37 | 0 | 0.0 | no | no | no | good | no | no | ckd |
| 69 | 26.0 | 70.0 | 1.015 | 0.0 | 4.0 | miss | normal | notpresent | notpresent | 250.0 | 20.0 | 1.10 | NaN | NaN | 15.6 | 52 | 6900 | 6.0 | no | yes | no | good | no | no | ckd |
| 70 | 61.0 | 80.0 | 1.015 | 0.0 | 4.0 | miss | normal | notpresent | notpresent | 360.0 | 19.0 | 0.70 | 137.0 | 4.4 | 15.2 | 44 | 8300 | 5.2 | yes | yes | no | good | no | no | ckd |
| 71 | 46.0 | 60.0 | 1.010 | 1.0 | 0.0 | normal | normal | notpresent | notpresent | 163.0 | 92.0 | 3.30 | 141.0 | 4.0 | 9.8 | 28 | 14600 | 3.2 | yes | yes | no | good | no | no | ckd |
| 72 | 64.0 | 90.0 | 1.010 | 3.0 | 3.0 | miss | abnormal | present | notpresent | NaN | 35.0 | 1.30 | NaN | NaN | 10.3 | 0 | 0 | 0.0 | yes | yes | no | good | yes | no | ckd |
| 73 | NaN | 100.0 | 1.015 | 2.0 | 0.0 | abnormal | abnormal | notpresent | notpresent | 129.0 | 107.0 | 6.70 | 132.0 | 4.4 | 4.8 | 14 | 6300 | 0.0 | yes | no | no | good | yes | yes | ckd |
| 74 | 56.0 | 90.0 | 1.015 | 2.0 | 0.0 | abnormal | abnormal | notpresent | notpresent | 129.0 | 107.0 | 6.70 | 131.0 | 4.8 | 9.1 | 29 | 6400 | 3.4 | yes | no | no | good | no | no | ckd |
| 75 | 5.0 | NaN | 1.015 | 1.0 | 0.0 | miss | normal | notpresent | notpresent | NaN | 16.0 | 0.70 | 138.0 | 3.2 | 8.1 | 0 | 0 | 0.0 | no | no | no | good | no | yes | ckd |
| 76 | 48.0 | 80.0 | 1.005 | 4.0 | 0.0 | abnormal | abnormal | notpresent | present | 133.0 | 139.0 | 8.50 | 132.0 | 5.5 | 10.3 | 36 | 6200 | 4.0 | no | yes | no | good | yes | no | ckd |
| 77 | 67.0 | 70.0 | 1.010 | 1.0 | 0.0 | miss | normal | notpresent | notpresent | 102.0 | 48.0 | 3.20 | 137.0 | 5.0 | 11.9 | 34 | 7100 | 3.7 | yes | yes | no | good | yes | no | ckd |
| 78 | 70.0 | 80.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 158.0 | 85.0 | 3.20 | 141.0 | 3.5 | 10.1 | 30 | 0 | 0.0 | yes | no | no | good | yes | no | ckd |
| 79 | 56.0 | 80.0 | 1.010 | 1.0 | 0.0 | miss | normal | notpresent | notpresent | 165.0 | 55.0 | 1.80 | NaN | NaN | 13.5 | 40 | 11800 | 5.0 | yes | yes | no | poor | yes | no | ckd |
| 80 | 74.0 | 80.0 | 1.010 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 132.0 | 98.0 | 2.80 | 133.0 | 5.0 | 10.8 | 31 | 9400 | 3.8 | yes | yes | no | good | no | no | ckd |
| 81 | 45.0 | 90.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 360.0 | 45.0 | 2.40 | 128.0 | 4.4 | 8.3 | 29 | 5500 | 3.7 | yes | yes | no | good | no | no | ckd |
| 82 | 38.0 | 70.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 104.0 | 77.0 | 1.90 | 140.0 | 3.9 | NaN | 0 | 0 | 0.0 | yes | no | no | poor | yes | no | ckd |
| 83 | 48.0 | 70.0 | 1.015 | 1.0 | 0.0 | normal | normal | notpresent | notpresent | 127.0 | 19.0 | 1.00 | 134.0 | 3.6 | NaN | 0 | 0 | 0.0 | yes | yes | no | good | no | no | ckd |
| 84 | 59.0 | 70.0 | 1.010 | 3.0 | 0.0 | normal | abnormal | notpresent | notpresent | 76.0 | 186.0 | 15.00 | 135.0 | 7.6 | 7.1 | 22 | 3800 | 2.1 | yes | no | no | poor | yes | yes | ckd |
| 85 | 70.0 | 70.0 | 1.015 | 2.0 | NaN | miss | miss | notpresent | notpresent | NaN | 46.0 | 1.50 | NaN | NaN | 9.9 | 0 | 0 | 0.0 | no | yes | no | poor | yes | no | ckd |
| 86 | 56.0 | 80.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 415.0 | 37.0 | 1.90 | NaN | NaN | NaN | 0 | 0 | 0.0 | no | yes | no | good | no | no | ckd |
| 87 | 70.0 | 100.0 | 1.005 | 1.0 | 0.0 | normal | abnormal | present | notpresent | 169.0 | 47.0 | 2.90 | NaN | NaN | 11.1 | 32 | 5800 | 5.0 | yes | yes | no | poor | no | no | ckd |
| 88 | 58.0 | 110.0 | 1.010 | 4.0 | 0.0 | miss | normal | notpresent | notpresent | 251.0 | 52.0 | 2.20 | NaN | NaN | NaN | 0 | 13200 | 4.7 | yes | \tyes | no | good | no | no | ckd |
| 89 | 50.0 | 70.0 | 1.020 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 109.0 | 32.0 | 1.40 | 139.0 | 4.7 | NaN | 0 | 0 | 0.0 | no | no | no | poor | no | no | ckd |
| 90 | 63.0 | 100.0 | 1.010 | 2.0 | 2.0 | normal | normal | notpresent | present | 280.0 | 35.0 | 3.20 | 143.0 | 3.5 | 13.0 | 40 | 9800 | 4.2 | yes | no | yes | good | no | no | ckd |
| 91 | 56.0 | 70.0 | 1.015 | 4.0 | 1.0 | abnormal | normal | notpresent | notpresent | 210.0 | 26.0 | 1.70 | 136.0 | 3.8 | 16.1 | 52 | 12500 | 5.6 | no | no | no | good | no | no | ckd |
| 92 | 71.0 | 70.0 | 1.010 | 3.0 | 0.0 | normal | abnormal | present | present | 219.0 | 82.0 | 3.60 | 133.0 | 4.4 | 10.4 | 33 | 5600 | 3.6 | yes | yes | yes | good | no | no | ckd |
| 93 | 73.0 | 100.0 | 1.010 | 3.0 | 2.0 | abnormal | abnormal | present | notpresent | 295.0 | 90.0 | 5.60 | 140.0 | 2.9 | 9.2 | 30 | 7000 | 3.2 | yes | yes | yes | poor | no | no | ckd |
| 94 | 65.0 | 70.0 | 1.010 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 93.0 | 66.0 | 1.60 | 137.0 | 4.5 | 11.6 | 36 | 11900 | 3.9 | no | yes | no | good | no | no | ckd |
| 95 | 62.0 | 90.0 | 1.015 | 1.0 | 0.0 | miss | normal | notpresent | notpresent | 94.0 | 25.0 | 1.10 | 131.0 | 3.7 | NaN | 0 | 0 | 0.0 | yes | no | no | good | yes | yes | ckd |
| 96 | 60.0 | 80.0 | 1.010 | 1.0 | 1.0 | miss | normal | notpresent | notpresent | 172.0 | 32.0 | 2.70 | NaN | NaN | 11.2 | 36 | 0 | 0.0 | no | yes | yes | poor | no | no | ckd |
| 97 | 65.0 | 60.0 | 1.015 | 1.0 | 0.0 | miss | normal | notpresent | notpresent | 91.0 | 51.0 | 2.20 | 132.0 | 3.8 | 10.0 | 32 | 9100 | 4.0 | yes | yes | no | poor | yes | no | ckd |
| 98 | 50.0 | 140.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 101.0 | 106.0 | 6.50 | 135.0 | 4.3 | 6.2 | 18 | 5800 | 2.3 | yes | yes | no | poor | no | yes | ckd |
| 99 | 56.0 | 180.0 | NaN | 0.0 | 4.0 | miss | abnormal | notpresent | notpresent | 298.0 | 24.0 | 1.20 | 139.0 | 3.9 | 11.2 | 32 | 10400 | 4.2 | yes | yes | no | poor | yes | no | ckd |
| 100 | 34.0 | 70.0 | 1.015 | 4.0 | 0.0 | abnormal | abnormal | notpresent | notpresent | 153.0 | 22.0 | 0.90 | 133.0 | 3.8 | NaN | 0 | 0 | 0.0 | no | no | no | good | yes | no | ckd |
| 101 | 71.0 | 90.0 | 1.015 | 2.0 | 0.0 | miss | abnormal | present | present | 88.0 | 80.0 | 4.40 | 139.0 | 5.7 | 11.3 | 33 | 10700 | 3.9 | no | no | no | good | no | no | ckd |
| 102 | 17.0 | 60.0 | 1.010 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 92.0 | 32.0 | 2.10 | 141.0 | 4.2 | 13.9 | 52 | 7000 | 0.0 | no | no | no | good | no | no | ckd |
| 103 | 76.0 | 70.0 | 1.015 | 2.0 | 0.0 | normal | abnormal | present | notpresent | 226.0 | 217.0 | 10.20 | NaN | NaN | 10.2 | 36 | 12700 | 4.2 | yes | no | no | poor | yes | yes | ckd |
| 104 | 55.0 | 90.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 143.0 | 88.0 | 2.00 | NaN | NaN | NaN | 0 | 0 | 0.0 | yes | yes | no | poor | yes | no | ckd |
| 105 | 65.0 | 80.0 | 1.015 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 115.0 | 32.0 | 11.50 | 139.0 | 4.0 | 14.1 | 42 | 6800 | 5.2 | no | no | no | good | no | no | ckd |
| 106 | 50.0 | 90.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 89.0 | 118.0 | 6.10 | 127.0 | 4.4 | 6.0 | 17 | 6500 | 0.0 | yes | yes | no | good | yes | yes | ckd |
| 107 | 55.0 | 100.0 | 1.015 | 1.0 | 4.0 | normal | miss | notpresent | notpresent | 297.0 | 53.0 | 2.80 | 139.0 | 4.5 | 11.2 | 34 | 13600 | 4.4 | yes | yes | no | good | no | no | ckd |
| 108 | 45.0 | 80.0 | 1.015 | 0.0 | 0.0 | miss | abnormal | notpresent | notpresent | 107.0 | 15.0 | 1.00 | 141.0 | 4.2 | 11.8 | 37 | 10200 | 4.2 | no | no | no | good | no | no | ckd |
| 109 | 54.0 | 70.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 233.0 | 50.1 | 1.90 | NaN | NaN | 11.7 | 0 | 0 | 0.0 | no | yes | no | good | no | no | ckd |
| 110 | 63.0 | 90.0 | 1.015 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 123.0 | 19.0 | 2.00 | 142.0 | 3.8 | 11.7 | 34 | 11400 | 4.7 | no | no | no | good | no | no | ckd |
| 111 | 65.0 | 80.0 | 1.010 | 3.0 | 3.0 | miss | normal | notpresent | notpresent | 294.0 | 71.0 | 4.40 | 128.0 | 5.4 | 10.0 | 32 | 9000 | 3.9 | yes | yes | yes | good | no | no | ckd |
| 112 | NaN | 60.0 | 1.015 | 3.0 | 0.0 | abnormal | abnormal | notpresent | notpresent | NaN | 34.0 | 1.20 | NaN | NaN | 10.8 | 33 | 0 | 0.0 | no | no | no | good | no | no | ckd |
| 113 | 61.0 | 90.0 | 1.015 | 0.0 | 2.0 | miss | normal | notpresent | notpresent | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 9800 | 0.0 | no | yes | no | poor | no | yes | ckd |
| 114 | 12.0 | 60.0 | 1.015 | 3.0 | 0.0 | abnormal | abnormal | present | notpresent | NaN | 51.0 | 1.80 | NaN | NaN | 12.1 | 0 | 10300 | 0.0 | no | no | no | good | no | no | ckd |
| 115 | 47.0 | 80.0 | 1.010 | 0.0 | 0.0 | miss | abnormal | notpresent | notpresent | NaN | 28.0 | 0.90 | NaN | NaN | 12.4 | 44 | 5600 | 4.3 | no | no | no | good | no | yes | ckd |
| 116 | NaN | 70.0 | 1.015 | 4.0 | 0.0 | abnormal | normal | notpresent | notpresent | 104.0 | 16.0 | 0.50 | NaN | NaN | NaN | 0 | 0 | 0.0 | no | no | no | good | yes | no | ckd |
| 117 | NaN | 70.0 | 1.020 | 0.0 | 0.0 | miss | miss | notpresent | notpresent | 219.0 | 36.0 | 1.30 | 139.0 | 3.7 | 12.5 | 37 | 9800 | 4.4 | no | no | no | good | no | no | ckd |
| 118 | 55.0 | 70.0 | 1.010 | 3.0 | 0.0 | miss | normal | notpresent | notpresent | 99.0 | 25.0 | 1.20 | NaN | NaN | 11.4 | 0 | 0 | 0.0 | no | no | no | poor | yes | no | ckd |
| 119 | 60.0 | 70.0 | 1.010 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 140.0 | 27.0 | 1.20 | NaN | NaN | NaN | 0 | 0 | 0.0 | no | no | no | good | no | no | ckd |
| 120 | 72.0 | 90.0 | 1.025 | 1.0 | 3.0 | miss | normal | notpresent | notpresent | 323.0 | 40.0 | 2.20 | 137.0 | 5.3 | 12.6 | 0 | 0 | 0.0 | no | yes | yes | poor | no | no | ckd |
| 121 | 54.0 | 60.0 | NaN | 3.0 | NaN | miss | miss | notpresent | notpresent | 125.0 | 21.0 | 1.30 | 137.0 | 3.4 | 15.0 | 46 | 0 | 0.0 | yes | yes | no | good | yes | no | ckd |
| 122 | 34.0 | 70.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | NaN | 219.0 | 12.20 | 130.0 | 3.8 | 6.0 | 0 | 0 | 0.0 | yes | no | no | good | no | yes | ckd |
| 123 | 43.0 | 80.0 | 1.015 | 2.0 | 3.0 | miss | abnormal | present | present | NaN | 30.0 | 1.10 | NaN | NaN | 14.0 | 42 | 14900 | 0.0 | no | no | no | good | no | no | ckd |
| 124 | 65.0 | 100.0 | 1.015 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 90.0 | 98.0 | 2.50 | NaN | NaN | 9.1 | 28 | 5500 | 3.6 | yes | no | no | good | no | no | ckd |
| 125 | 72.0 | 90.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 308.0 | 36.0 | 2.50 | 131.0 | 4.3 | NaN | 0 | 0 | 0.0 | yes | yes | no | poor | no | no | ckd |
| 126 | 70.0 | 90.0 | 1.015 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 144.0 | 125.0 | 4.00 | 136.0 | 4.6 | 12.0 | 37 | 8200 | 4.5 | yes | yes | no | poor | yes | no | ckd |
| 127 | 71.0 | 60.0 | 1.015 | 4.0 | 0.0 | normal | normal | notpresent | notpresent | 118.0 | 125.0 | 5.30 | 136.0 | 4.9 | 11.4 | 35 | 15200 | 4.3 | yes | yes | no | poor | yes | no | ckd |
| 128 | 52.0 | 90.0 | 1.015 | 4.0 | 3.0 | normal | abnormal | notpresent | notpresent | 224.0 | 166.0 | 5.60 | 133.0 | 47.0 | 8.1 | 23 | 5000 | 2.9 | yes | yes | no | good | no | yes | ckd |
| 129 | 75.0 | 70.0 | 1.025 | 1.0 | 0.0 | miss | normal | notpresent | notpresent | 158.0 | 49.0 | 1.40 | 135.0 | 4.7 | 11.1 | 0 | 0 | 0.0 | yes | no | no | poor | yes | no | ckd |
| 130 | 50.0 | 90.0 | 1.010 | 2.0 | 0.0 | normal | abnormal | present | present | 128.0 | 208.0 | 9.20 | 134.0 | 4.8 | 8.2 | 22 | 16300 | 2.7 | no | no | no | poor | yes | yes | ckd |
| 131 | 5.0 | 50.0 | 1.010 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | NaN | 25.0 | 0.60 | NaN | NaN | 11.8 | 36 | 12400 | 0.0 | no | no | no | good | no | no | ckd |
| 132 | 50.0 | NaN | NaN | NaN | NaN | normal | miss | notpresent | notpresent | 219.0 | 176.0 | 13.80 | 136.0 | 4.5 | 8.6 | 24 | 13200 | 2.7 | yes | no | no | good | yes | yes | ckd |
| 133 | 70.0 | 100.0 | 1.015 | 4.0 | 0.0 | normal | normal | notpresent | notpresent | 118.0 | 125.0 | 5.30 | 136.0 | 4.9 | 12.0 | 37 | 8400 | 8.0 | yes | no | no | good | no | no | ckd |
| 134 | 47.0 | 100.0 | 1.010 | NaN | NaN | normal | miss | notpresent | notpresent | 122.0 | NaN | 16.90 | 138.0 | 5.2 | 10.8 | 33 | 10200 | 3.8 | no | yes | no | good | no | no | ckd |
| 135 | 48.0 | 80.0 | 1.015 | 0.0 | 2.0 | miss | normal | notpresent | notpresent | 214.0 | 24.0 | 1.30 | 140.0 | 4.0 | 13.2 | 39 | 0 | 0.0 | no | yes | no | poor | no | no | ckd |
| 136 | 46.0 | 90.0 | 1.020 | NaN | NaN | miss | normal | notpresent | notpresent | 213.0 | 68.0 | 2.80 | 146.0 | 6.3 | 9.3 | 0 | 0 | 0.0 | yes | yes | no | good | no | no | ckd |
| 137 | 45.0 | 60.0 | 1.010 | 2.0 | 0.0 | normal | abnormal | present | notpresent | 268.0 | 86.0 | 4.00 | 134.0 | 5.1 | 10.0 | 29 | 9200 | 0.0 | yes | yes | no | good | no | no | ckd |
| 138 | 73.0 | NaN | 1.010 | 1.0 | 0.0 | miss | miss | notpresent | notpresent | 95.0 | 51.0 | 1.60 | 142.0 | 3.5 | NaN | 0 | 0 | 0.0 | no | \tno | no | good | no | no | ckd |
| 139 | 41.0 | 70.0 | 1.015 | 2.0 | 0.0 | miss | abnormal | notpresent | present | NaN | 68.0 | 2.80 | 132.0 | 4.1 | 11.1 | 33 | 0 | 0.0 | yes | no | no | good | yes | yes | ckd |
| 140 | 69.0 | 70.0 | 1.010 | 0.0 | 4.0 | miss | normal | notpresent | notpresent | 256.0 | 40.0 | 1.20 | 142.0 | 5.6 | NaN | 0 | 0 | 0.0 | no | no | no | good | no | no | ckd |
| 141 | 67.0 | 70.0 | 1.010 | 1.0 | 0.0 | normal | normal | notpresent | notpresent | NaN | 106.0 | 6.00 | 137.0 | 4.9 | 6.1 | 19 | 6500 | 0.0 | yes | no | no | good | no | yes | ckd |
| 142 | 72.0 | 90.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 84.0 | 145.0 | 7.10 | 135.0 | 5.3 | NaN | 0 | 0 | 0.0 | no | yes | no | good | no | no | ckd |
| 143 | 41.0 | 80.0 | 1.015 | 1.0 | 4.0 | abnormal | normal | notpresent | notpresent | 210.0 | 165.0 | 18.00 | 135.0 | 4.7 | NaN | 0 | 0 | 0.0 | no | yes | no | good | no | no | ckd |
| 144 | 60.0 | 90.0 | 1.010 | 2.0 | 0.0 | abnormal | normal | notpresent | notpresent | 105.0 | 53.0 | 2.30 | 136.0 | 5.2 | 11.1 | 33 | 10500 | 4.1 | no | no | no | good | no | no | ckd |
| 145 | 57.0 | 90.0 | 1.015 | 5.0 | 0.0 | abnormal | abnormal | notpresent | present | NaN | 322.0 | 13.00 | 126.0 | 4.8 | 8.0 | 24 | 4200 | 3.3 | yes | yes | yes | poor | yes | yes | ckd |
| 146 | 53.0 | 100.0 | 1.010 | 1.0 | 3.0 | abnormal | normal | notpresent | notpresent | 213.0 | 23.0 | 1.00 | 139.0 | 4.0 | NaN | 0 | 0 | 0.0 | no | yes | no | good | no | no | ckd |
| 147 | 60.0 | 60.0 | 1.010 | 3.0 | 1.0 | normal | abnormal | present | notpresent | 288.0 | 36.0 | 1.70 | 130.0 | 3.0 | 7.9 | 25 | 15200 | 3.0 | yes | no | no | poor | no | yes | ckd |
| 148 | 69.0 | 60.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 171.0 | 26.0 | 48.10 | NaN | NaN | NaN | 0 | 0 | 0.0 | yes | no | no | poor | no | no | ckd |
| 149 | 65.0 | 70.0 | 1.020 | 1.0 | 0.0 | abnormal | abnormal | notpresent | notpresent | 139.0 | 29.0 | 1.00 | NaN | NaN | 10.5 | 32 | 0 | 0.0 | yes | no | no | good | yes | no | ckd |
| 150 | 8.0 | 60.0 | 1.025 | 3.0 | 0.0 | normal | normal | notpresent | notpresent | 78.0 | 27.0 | 0.90 | NaN | NaN | 12.3 | 41 | 6700 | 0.0 | no | no | no | poor | yes | no | ckd |
| 151 | 76.0 | 90.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 172.0 | 46.0 | 1.70 | 141.0 | 5.5 | 9.6 | 30 | 0 | 0.0 | yes | yes | no | good | no | yes | ckd |
| 152 | 39.0 | 70.0 | 1.010 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 121.0 | 20.0 | 0.80 | 133.0 | 3.5 | 10.9 | 32 | 0 | 0.0 | no | yes | no | good | no | no | ckd |
| 153 | 55.0 | 90.0 | 1.010 | 2.0 | 1.0 | abnormal | abnormal | notpresent | notpresent | 273.0 | 235.0 | 14.20 | 132.0 | 3.4 | 8.3 | 22 | 14600 | 2.9 | yes | yes | no | poor | yes | yes | ckd |
| 154 | 56.0 | 90.0 | 1.005 | 4.0 | 3.0 | abnormal | abnormal | notpresent | notpresent | 242.0 | 132.0 | 16.40 | 140.0 | 4.2 | 8.4 | 26 | 0 | 3.0 | yes | yes | no | poor | yes | yes | ckd |
| 155 | 50.0 | 70.0 | 1.020 | 3.0 | 0.0 | abnormal | normal | present | present | 123.0 | 40.0 | 1.80 | NaN | NaN | 11.1 | 36 | 4700 | 0.0 | no | no | no | good | no | no | ckd |
| 156 | 66.0 | 90.0 | 1.015 | 2.0 | 0.0 | miss | normal | notpresent | present | 153.0 | 76.0 | 3.30 | NaN | NaN | NaN | 0 | 0 | 0.0 | no | no | no | poor | no | no | ckd |
| 157 | 62.0 | 70.0 | 1.025 | 3.0 | 0.0 | normal | abnormal | notpresent | notpresent | 122.0 | 42.0 | 1.70 | 136.0 | 4.7 | 12.6 | 39 | 7900 | 3.9 | yes | yes | no | good | no | no | ckd |
| 158 | 71.0 | 60.0 | 1.020 | 3.0 | 2.0 | normal | normal | present | notpresent | 424.0 | 48.0 | 1.50 | 132.0 | 4.0 | 10.9 | 31 | 0 | 0.0 | yes | yes | yes | good | no | no | ckd |
| 159 | 59.0 | 80.0 | 1.010 | 1.0 | 0.0 | abnormal | normal | notpresent | notpresent | 303.0 | 35.0 | 1.30 | 122.0 | 3.5 | 10.4 | 35 | 10900 | 4.3 | no | yes | no | poor | no | no | ckd |
| 160 | 81.0 | 60.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 148.0 | 39.0 | 2.10 | 147.0 | 4.2 | 10.9 | 35 | 9400 | 2.4 | yes | yes | yes | poor | yes | no | ckd |
| 161 | 62.0 | NaN | 1.015 | 3.0 | 0.0 | abnormal | miss | notpresent | notpresent | NaN | NaN | NaN | NaN | NaN | 14.3 | 42 | 10200 | 4.8 | yes | yes | no | good | no | no | ckd |
| 162 | 59.0 | 70.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 204.0 | 34.0 | 1.50 | 124.0 | 4.1 | 9.8 | 37 | 6000 | -999.0 | no | yes | no | good | no | no | ckd |
| 163 | 46.0 | 80.0 | 1.010 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 160.0 | 40.0 | 2.00 | 140.0 | 4.1 | 9.0 | 27 | 8100 | 3.2 | yes | no | no | poor | no | yes | ckd |
| 164 | 14.0 | NaN | 1.015 | 0.0 | 0.0 | miss | miss | notpresent | notpresent | 192.0 | 15.0 | 0.80 | 137.0 | 4.2 | 14.3 | 40 | 9500 | 5.4 | no | yes | no | poor | yes | no | ckd |
| 165 | 60.0 | 80.0 | 1.020 | 0.0 | 2.0 | miss | miss | notpresent | notpresent | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 0 | 0.0 | no | yes | no | good | no | no | ckd |
| 166 | 27.0 | 60.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 76.0 | 44.0 | 3.90 | 127.0 | 4.3 | NaN | 0 | 0 | 0.0 | no | no | no | poor | yes | yes | ckd |
| 167 | 34.0 | 70.0 | 1.020 | 0.0 | 0.0 | abnormal | normal | notpresent | notpresent | 139.0 | 19.0 | 0.90 | NaN | NaN | 12.7 | 42 | 2200 | 0.0 | no | no | no | poor | no | no | ckd |
| 168 | 65.0 | 70.0 | 1.015 | 4.0 | 4.0 | miss | normal | present | notpresent | 307.0 | 28.0 | 1.50 | NaN | NaN | 11.0 | 39 | 6700 | 0.0 | yes | yes | no | good | no | no | ckd |
| 169 | NaN | 70.0 | 1.010 | 0.0 | 2.0 | miss | normal | notpresent | notpresent | 220.0 | 68.0 | 2.80 | NaN | NaN | 8.7 | 27 | 0 | 0.0 | yes | yes | no | good | no | yes | ckd |
| 170 | 66.0 | 70.0 | 1.015 | 2.0 | 5.0 | miss | normal | notpresent | notpresent | 447.0 | 41.0 | 1.70 | 131.0 | 3.9 | 12.5 | 33 | 9600 | 4.4 | yes | yes | no | good | no | no | ckd |
| 171 | 83.0 | 70.0 | 1.020 | 3.0 | 0.0 | normal | normal | notpresent | notpresent | 102.0 | 60.0 | 2.60 | 115.0 | 5.7 | 8.7 | 26 | 12800 | 3.1 | yes | no | no | poor | no | yes | ckd |
| 172 | 62.0 | 80.0 | 1.010 | 1.0 | 2.0 | miss | miss | notpresent | notpresent | 309.0 | 113.0 | 2.90 | 130.0 | 2.5 | 10.6 | 34 | 12800 | 4.9 | no | no | no | good | no | no | ckd |
| 173 | 17.0 | 70.0 | 1.015 | 1.0 | 0.0 | abnormal | normal | notpresent | notpresent | 22.0 | 1.5 | 7.30 | 145.0 | 2.8 | 13.1 | 41 | 11200 | 0.0 | no | no | no | good | no | no | ckd |
| 174 | 54.0 | 70.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 111.0 | 146.0 | 7.50 | 141.0 | 4.7 | 11.0 | 35 | 8600 | 4.6 | no | no | no | good | no | no | ckd |
| 175 | 60.0 | 50.0 | 1.010 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 261.0 | 58.0 | 2.20 | 113.0 | 3.0 | NaN | 0 | 4200 | 3.4 | yes | no | no | good | no | no | ckd |
| 176 | 21.0 | 90.0 | 1.010 | 4.0 | 0.0 | normal | abnormal | present | present | 107.0 | 40.0 | 1.70 | 125.0 | 3.5 | 8.3 | 23 | 12400 | 3.9 | no | no | no | good | no | yes | ckd |
| 177 | 65.0 | 80.0 | 1.015 | 2.0 | 1.0 | normal | normal | present | notpresent | 215.0 | 133.0 | 2.50 | NaN | NaN | 13.2 | 41 | 0 | 0.0 | no | yes | no | good | no | no | ckd |
| 178 | 42.0 | 90.0 | 1.020 | 2.0 | 0.0 | abnormal | abnormal | present | notpresent | 93.0 | 153.0 | 2.70 | 139.0 | 4.3 | 9.8 | 34 | 9800 | 0.0 | no | no | no | poor | yes | yes | ckd |
| 179 | 72.0 | 90.0 | 1.010 | 2.0 | 0.0 | miss | abnormal | present | notpresent | 124.0 | 53.0 | 2.30 | NaN | NaN | 11.9 | 39 | 0 | 0.0 | no | no | no | good | no | no | ckd |
| 180 | 73.0 | 90.0 | 1.010 | 1.0 | 4.0 | abnormal | abnormal | present | notpresent | 234.0 | 56.0 | 1.90 | NaN | NaN | 10.3 | 28 | 0 | 0.0 | no | yes | no | good | no | no | ckd |
| 181 | 45.0 | 70.0 | 1.025 | 2.0 | 0.0 | normal | abnormal | present | notpresent | 117.0 | 52.0 | 2.20 | 136.0 | 3.8 | 10.0 | 30 | 19100 | 3.7 | no | no | no | good | no | no | ckd |
| 182 | 61.0 | 80.0 | 1.020 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 131.0 | 23.0 | 0.80 | 140.0 | 4.1 | 11.3 | 35 | 0 | 0.0 | no | no | no | good | no | no | ckd |
| 183 | 30.0 | 70.0 | 1.015 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 101.0 | 106.0 | 6.50 | 135.0 | 4.3 | NaN | 0 | 0 | 0.0 | no | no | no | poor | no | no | ckd |
| 184 | 54.0 | 60.0 | 1.015 | 3.0 | 2.0 | miss | abnormal | notpresent | notpresent | 352.0 | 137.0 | 3.30 | 133.0 | 4.5 | 11.3 | 31 | 5800 | 3.6 | yes | yes | yes | poor | yes | no | ckd |
| 185 | 4.0 | NaN | 1.020 | 1.0 | 0.0 | miss | normal | notpresent | notpresent | 99.0 | 23.0 | 0.60 | 138.0 | 4.4 | 12.0 | 34 | -999 | 0.0 | no | no | no | good | no | no | ckd |
| 186 | 8.0 | 50.0 | 1.020 | 4.0 | 0.0 | normal | normal | notpresent | notpresent | NaN | 46.0 | 1.00 | 135.0 | 3.8 | NaN | 0 | 0 | 0.0 | no | no | no | good | yes | no | ckd |
| 187 | 3.0 | NaN | 1.010 | 2.0 | 0.0 | normal | normal | notpresent | notpresent | NaN | 22.0 | 0.70 | NaN | NaN | 10.7 | 34 | 12300 | 0.0 | no | no | no | good | no | no | ckd |
| 188 | 8.0 | NaN | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 80.0 | 66.0 | 2.50 | 142.0 | 3.6 | 12.2 | 38 | 0 | 0.0 | no | \tno | no | good | no | no | ckd |
| 189 | 64.0 | 60.0 | 1.010 | 4.0 | 1.0 | abnormal | abnormal | notpresent | present | 239.0 | 58.0 | 4.30 | 137.0 | 5.4 | 9.5 | 29 | 7500 | 3.4 | yes | yes | no | poor | yes | no | ckd |
| 190 | 6.0 | 60.0 | 1.010 | 4.0 | 0.0 | abnormal | abnormal | notpresent | present | 94.0 | 67.0 | 1.00 | 135.0 | 4.9 | 9.9 | 30 | 16700 | 4.8 | no | no | no | poor | no | no | ckd |
| 191 | NaN | 70.0 | 1.010 | 3.0 | 0.0 | normal | normal | notpresent | notpresent | 110.0 | 115.0 | 6.00 | 134.0 | 2.7 | 9.1 | 26 | 9200 | 3.4 | yes | yes | no | poor | no | no | ckd |
| 192 | 46.0 | 110.0 | 1.015 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 130.0 | 16.0 | 0.90 | NaN | NaN | NaN | 0 | 0 | 0.0 | no | no | no | good | no | no | ckd |
| 193 | 32.0 | 90.0 | 1.025 | 1.0 | 0.0 | abnormal | abnormal | notpresent | notpresent | NaN | 223.0 | 18.10 | 113.0 | 6.5 | 5.5 | 15 | 2600 | 2.8 | yes | yes | no | poor | yes | yes | ckd |
| 194 | 80.0 | 70.0 | 1.010 | 2.0 | NaN | miss | abnormal | notpresent | notpresent | NaN | 49.0 | 1.20 | NaN | NaN | NaN | 0 | 0 | 0.0 | yes | \tyes | no | good | no | no | ckd |
| 195 | 70.0 | 90.0 | 1.020 | 2.0 | 1.0 | abnormal | abnormal | notpresent | present | 184.0 | 98.6 | 3.30 | 138.0 | 3.9 | 5.8 | 0 | 0 | 0.0 | yes | yes | yes | poor | no | no | ckd |
| 196 | 49.0 | 100.0 | 1.010 | 3.0 | 0.0 | abnormal | abnormal | notpresent | notpresent | 129.0 | 158.0 | 11.80 | 122.0 | 3.2 | 8.1 | 24 | 9600 | 3.5 | yes | yes | no | poor | yes | yes | ckd |
| 197 | 57.0 | 80.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | NaN | 111.0 | 9.30 | 124.0 | 5.3 | 6.8 | 0 | 4300 | 3.0 | yes | yes | no | good | no | yes | ckd |
| 198 | 59.0 | 100.0 | 1.020 | 4.0 | 2.0 | normal | normal | notpresent | notpresent | 252.0 | 40.0 | 3.20 | 137.0 | 4.7 | 11.2 | 30 | 26400 | 3.9 | yes | yes | no | poor | yes | no | ckd |
| 199 | 65.0 | 80.0 | 1.015 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 92.0 | 37.0 | 1.50 | 140.0 | 5.2 | 8.8 | 25 | 10700 | 3.2 | yes | no | yes | good | yes | no | ckd |
| 200 | 90.0 | 90.0 | 1.025 | 1.0 | 0.0 | miss | normal | notpresent | notpresent | 139.0 | 89.0 | 3.00 | 140.0 | 4.1 | 12.0 | 37 | 7900 | 3.9 | yes | yes | no | good | no | no | ckd |
| 201 | 64.0 | 70.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 113.0 | 94.0 | 7.30 | 137.0 | 4.3 | 7.9 | 21 | 0 | 0.0 | yes | yes | yes | good | yes | yes | ckd |
| 202 | 78.0 | 60.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 114.0 | 74.0 | 2.90 | 135.0 | 5.9 | 8.0 | 24 | 0 | 0.0 | no | yes | no | good | no | yes | ckd |
| 203 | NaN | 90.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 207.0 | 80.0 | 6.80 | 142.0 | 5.5 | 8.5 | 0 | 0 | 0.0 | yes | yes | no | good | no | yes | ckd |
| 204 | 65.0 | 90.0 | 1.010 | 4.0 | 2.0 | normal | normal | notpresent | notpresent | 172.0 | 82.0 | 13.50 | 145.0 | 6.3 | 8.8 | 31 | 0 | 0.0 | yes | yes | no | good | yes | yes | ckd |
| 205 | 61.0 | 70.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 100.0 | 28.0 | 2.10 | NaN | NaN | 12.6 | 43 | 0 | 0.0 | yes | yes | no | good | no | no | ckd |
| 206 | 60.0 | 70.0 | 1.010 | 1.0 | 0.0 | miss | normal | notpresent | notpresent | 109.0 | 96.0 | 3.90 | 135.0 | 4.0 | 13.8 | 41 | 0 | 0.0 | yes | no | no | good | no | no | ckd |
| 207 | 50.0 | 70.0 | 1.010 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 230.0 | 50.0 | 2.20 | NaN | NaN | 12.0 | 41 | 10400 | 4.6 | yes | yes | no | good | no | no | ckd |
| 208 | 67.0 | 80.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 341.0 | 37.0 | 1.50 | NaN | NaN | 12.3 | 41 | 6900 | 4.9 | yes | yes | no | good | no | yes | ckd |
| 209 | 19.0 | 70.0 | 1.020 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | NaN | NaN | NaN | NaN | NaN | 11.5 | 0 | 6900 | 0.0 | no | no | no | good | no | no | ckd |
| 210 | 59.0 | 100.0 | 1.015 | 4.0 | 2.0 | normal | normal | notpresent | notpresent | 255.0 | 132.0 | 12.80 | 135.0 | 5.7 | 7.3 | 20 | 9800 | 3.9 | yes | yes | yes | good | no | yes | ckd |
| 211 | 54.0 | 120.0 | 1.015 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 103.0 | 18.0 | 1.20 | NaN | NaN | NaN | 0 | 0 | 0.0 | no | no | no | good | no | no | ckd |
| 212 | 40.0 | 70.0 | 1.015 | 3.0 | 4.0 | normal | normal | notpresent | notpresent | 253.0 | 150.0 | 11.90 | 132.0 | 5.6 | 10.9 | 31 | 8800 | 3.4 | yes | yes | no | poor | yes | no | ckd |
| 213 | 55.0 | 80.0 | 1.010 | 3.0 | 1.0 | normal | abnormal | present | present | 214.0 | 73.0 | 3.90 | 137.0 | 4.9 | 10.9 | 34 | 7400 | 3.7 | yes | yes | no | good | yes | no | ckd |
| 214 | 68.0 | 80.0 | 1.015 | 0.0 | 0.0 | miss | abnormal | notpresent | notpresent | 171.0 | 30.0 | 1.00 | NaN | NaN | 13.7 | 43 | 4900 | 5.2 | no | yes | no | good | no | no | ckd |
| 215 | 2.0 | NaN | 1.010 | 3.0 | 0.0 | normal | abnormal | notpresent | notpresent | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 0 | 0.0 | no | no | no | good | yes | no | ckd |
| 216 | 64.0 | 70.0 | 1.010 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 107.0 | 15.0 | NaN | NaN | NaN | 12.8 | 38 | 0 | 0.0 | no | no | no | good | no | no | ckd |
| 217 | 63.0 | 100.0 | 1.010 | 1.0 | 0.0 | miss | normal | notpresent | notpresent | 78.0 | 61.0 | 1.80 | 141.0 | 4.4 | 12.2 | 36 | 10500 | 4.3 | no | yes | no | good | no | no | ckd |
| 218 | 33.0 | 90.0 | 1.015 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 92.0 | 19.0 | 0.80 | NaN | NaN | 11.8 | 34 | 7000 | 0.0 | no | no | no | good | no | no | ckd |
| 219 | 68.0 | 90.0 | 1.010 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 238.0 | 57.0 | 2.50 | NaN | NaN | 9.8 | 28 | 8000 | 3.3 | yes | yes | no | poor | no | no | ckd |
| 220 | 36.0 | 80.0 | 1.010 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 103.0 | NaN | NaN | NaN | NaN | 11.9 | 36 | 8800 | 0.0 | no | no | no | good | no | no | ckd |
| 221 | 66.0 | 70.0 | 1.020 | 1.0 | 0.0 | normal | miss | notpresent | notpresent | 248.0 | 30.0 | 1.70 | 138.0 | 5.3 | NaN | 0 | 0 | 0.0 | yes | yes | no | good | no | no | ckd |
| 222 | 74.0 | 60.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 108.0 | 68.0 | 1.80 | NaN | NaN | NaN | 0 | 0 | 0.0 | yes | yes | no | good | no | no | ckd |
| 223 | 71.0 | 90.0 | 1.010 | 0.0 | 3.0 | miss | normal | notpresent | notpresent | 303.0 | 30.0 | 1.30 | 136.0 | 4.1 | 13.0 | 38 | 9200 | 4.6 | yes | yes | no | good | no | no | ckd |
| 224 | 34.0 | 60.0 | 1.020 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 117.0 | 28.0 | 2.20 | 138.0 | 3.8 | NaN | 0 | 0 | 0.0 | no | no | no | good | yes | no | ckd |
| 225 | 60.0 | 90.0 | 1.010 | 3.0 | 5.0 | abnormal | normal | notpresent | present | 490.0 | 95.0 | 2.70 | 131.0 | 3.8 | 11.5 | 35 | 12000 | 4.5 | yes | yes | no | good | no | no | ckd |
| 226 | 64.0 | 100.0 | 1.015 | 4.0 | 2.0 | abnormal | abnormal | notpresent | present | 163.0 | 54.0 | 7.20 | 140.0 | 4.6 | 7.9 | 26 | 7500 | 3.4 | yes | yes | no | good | yes | no | ckd |
| 227 | 57.0 | 80.0 | 1.015 | 0.0 | 0.0 | miss | normal | notpresent | notpresent | 120.0 | 48.0 | 1.60 | NaN | NaN | 11.3 | 36 | 7200 | 3.8 | yes | yes | no | good | no | no | ckd |
| 228 | 60.0 | 70.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 124.0 | 52.0 | 2.50 | NaN | NaN | NaN | 0 | 0 | 0.0 | yes | no | no | good | no | no | ckd |
| 229 | 59.0 | 50.0 | 1.010 | 3.0 | 0.0 | normal | abnormal | notpresent | notpresent | 241.0 | 191.0 | 12.00 | 114.0 | 2.9 | 9.6 | 31 | 15700 | 3.8 | no | yes | no | good | yes | no | ckd |
| 230 | 65.0 | 60.0 | 1.010 | 2.0 | 0.0 | normal | abnormal | present | notpresent | 192.0 | 17.0 | 1.70 | 130.0 | 4.3 | NaN | 0 | 9500 | 0.0 | yes | yes | no | poor | no | no | ckd\t |
| 231 | 60.0 | 90.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 269.0 | 51.0 | 2.80 | 138.0 | 3.7 | 11.5 | 35 | 0 | 0.0 | yes | yes | yes | good | yes | no | ckd |
| 232 | 50.0 | 90.0 | 1.015 | 1.0 | 0.0 | abnormal | abnormal | notpresent | notpresent | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 0 | 0.0 | no | no | no | good | yes | no | ckd |
| 233 | 51.0 | 100.0 | 1.015 | 2.0 | 0.0 | normal | normal | notpresent | present | 93.0 | 20.0 | 1.60 | 146.0 | 4.5 | NaN | 0 | 0 | 0.0 | no | no | no | poor | no | no | ckd |
| 234 | 37.0 | 100.0 | 1.010 | 0.0 | 0.0 | abnormal | normal | notpresent | notpresent | NaN | 19.0 | 1.30 | NaN | NaN | 15.0 | 44 | 4100 | 5.2 | yes | no | no | good | no | no | ckd |
| 235 | 45.0 | 70.0 | 1.010 | 2.0 | 0.0 | miss | normal | notpresent | notpresent | 113.0 | 93.0 | 2.30 | NaN | NaN | 7.9 | 26 | 5700 | 0.0 | no | no | yes | good | no | yes | ckd |
| 236 | 65.0 | 80.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 74.0 | 66.0 | 2.00 | 136.0 | 5.4 | 9.1 | 25 | 0 | 0.0 | yes | yes | yes | good | yes | no | ckd |
| 237 | 80.0 | 70.0 | 1.015 | 2.0 | 2.0 | miss | normal | notpresent | notpresent | 141.0 | 53.0 | 2.20 | NaN | NaN | 12.7 | 40 | 9600 | 0.0 | yes | yes | no | poor | yes | no | ckd |
| 238 | 72.0 | 100.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 201.0 | 241.0 | 13.40 | 127.0 | 4.8 | 9.4 | 28 | 0 | 0.0 | yes | yes | no | good | no | yes | ckd |
| 239 | 34.0 | 90.0 | 1.015 | 2.0 | 0.0 | normal | normal | notpresent | notpresent | 104.0 | 50.0 | 1.60 | 137.0 | 4.1 | 11.9 | 39 | 0 | 0.0 | no | no | no | good | no | no | ckd |
| 240 | 65.0 | 70.0 | 1.015 | 1.0 | 0.0 | miss | normal | notpresent | notpresent | 203.0 | 46.0 | 1.40 | NaN | NaN | 11.4 | 36 | 5000 | 4.1 | yes | yes | no | poor | yes | no | ckd |
| 241 | 57.0 | 70.0 | 1.015 | 1.0 | 0.0 | miss | abnormal | notpresent | notpresent | 165.0 | 45.0 | 1.50 | 140.0 | 3.3 | 10.4 | 31 | 4200 | 3.9 | no | no | no | good | no | no | ckd |
| 242 | 69.0 | 70.0 | 1.010 | 4.0 | 3.0 | normal | abnormal | present | present | 214.0 | 96.0 | 6.30 | 120.0 | 3.9 | 9.4 | 28 | 11500 | 3.3 | yes | yes | yes | good | yes | yes | ckd |
| 243 | 62.0 | 90.0 | 1.020 | 2.0 | 1.0 | miss | normal | notpresent | notpresent | 169.0 | 48.0 | 2.40 | 138.0 | 2.9 | 13.4 | 47 | 11000 | 6.1 | yes | no | no | good | no | no | ckd |
| 244 | 64.0 | 90.0 | 1.015 | 3.0 | 2.0 | miss | abnormal | present | notpresent | 463.0 | 64.0 | 2.80 | 135.0 | 4.1 | 12.2 | 40 | 9800 | 4.6 | yes | yes | no | good | no | yes | ckd |
| 245 | 48.0 | 100.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 103.0 | 79.0 | 5.30 | 135.0 | 6.3 | 6.3 | 19 | 7200 | 2.6 | yes | no | yes | poor | no | no | ckd |
| 246 | 48.0 | 110.0 | 1.015 | 3.0 | 0.0 | abnormal | normal | present | notpresent | 106.0 | 215.0 | 15.20 | 120.0 | 5.7 | 8.6 | 26 | 5000 | 2.5 | yes | no | yes | good | no | yes | ckd |
| 247 | 54.0 | 90.0 | 1.025 | 1.0 | 0.0 | normal | abnormal | notpresent | notpresent | 150.0 | 18.0 | 1.20 | 140.0 | 4.2 | NaN | 0 | 0 | 0.0 | no | no | no | poor | yes | yes | ckd |
| 248 | 59.0 | 70.0 | 1.010 | 1.0 | 3.0 | abnormal | abnormal | notpresent | notpresent | 424.0 | 55.0 | 1.70 | 138.0 | 4.5 | 12.6 | 37 | 10200 | 4.1 | yes | yes | yes | good | no | no | ckd |
| 249 | 56.0 | 90.0 | 1.010 | 4.0 | 1.0 | normal | abnormal | present | notpresent | 176.0 | 309.0 | 13.30 | 124.0 | 6.5 | 3.1 | 9 | 5400 | 2.1 | yes | yes | no | poor | yes | yes | ckd |
| 250 | 40.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 140.0 | 10.0 | 1.20 | 135.0 | 5.0 | 15.0 | 48 | 10400 | 4.5 | no | no | no | good | no | no | notckd |
| 251 | 23.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 70.0 | 36.0 | 1.00 | 150.0 | 4.6 | 17.0 | 52 | 9800 | 5.0 | no | no | no | good | no | no | notckd |
| 252 | 45.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 82.0 | 49.0 | 0.60 | 147.0 | 4.4 | 15.9 | 46 | 9100 | 4.7 | no | no | no | good | no | no | notckd |
| 253 | 57.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 119.0 | 17.0 | 1.20 | 135.0 | 4.7 | 15.4 | 42 | 6200 | 6.2 | no | no | no | good | no | no | notckd |
| 254 | 51.0 | 60.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 99.0 | 38.0 | 0.80 | 135.0 | 3.7 | 13.0 | 49 | 8300 | 5.2 | no | no | no | good | no | no | notckd |
| 255 | 34.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 121.0 | 27.0 | 1.20 | 144.0 | 3.9 | 13.6 | 52 | 9200 | 6.3 | no | no | no | good | no | no | notckd |
| 256 | 60.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 131.0 | 10.0 | 0.50 | 146.0 | 5.0 | 14.5 | 41 | 10700 | 5.1 | no | no | no | good | no | no | notckd |
| 257 | 38.0 | 60.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 91.0 | 36.0 | 0.70 | 135.0 | 3.7 | 14.0 | 46 | 9100 | 5.8 | no | no | no | good | no | no | notckd |
| 258 | 42.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 98.0 | 20.0 | 0.50 | 140.0 | 3.5 | 13.9 | 44 | 8400 | 5.5 | no | no | no | good | no | no | notckd |
| 259 | 35.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 104.0 | 31.0 | 1.20 | 135.0 | 5.0 | 16.1 | 45 | 4300 | 5.2 | no | no | no | good | no | no | notckd |
| 260 | 30.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 131.0 | 38.0 | 1.00 | 147.0 | 3.8 | 14.1 | 45 | 9400 | 5.3 | no | no | no | good | no | no | notckd |
| 261 | 49.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 122.0 | 32.0 | 1.20 | 139.0 | 3.9 | 17.0 | 41 | 5600 | 4.9 | no | no | no | good | no | no | notckd |
| 262 | 55.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 118.0 | 18.0 | 0.90 | 135.0 | 3.6 | 15.5 | 43 | 7200 | 5.4 | no | no | no | good | no | no | notckd |
| 263 | 45.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 117.0 | 46.0 | 1.20 | 137.0 | 5.0 | 16.2 | 45 | 8600 | 5.2 | no | no | no | good | no | no | notckd |
| 264 | 42.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 132.0 | 24.0 | 0.70 | 140.0 | 4.1 | 14.4 | 50 | 5000 | 4.5 | no | no | no | good | no | no | notckd |
| 265 | 50.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 97.0 | 40.0 | 0.60 | 150.0 | 4.5 | 14.2 | 48 | 10500 | 5.0 | no | no | no | good | no | no | notckd |
| 266 | 55.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 133.0 | 17.0 | 1.20 | 135.0 | 4.8 | 13.2 | 41 | 6800 | 5.3 | no | no | no | good | no | no | notckd |
| 267 | 48.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 122.0 | 33.0 | 0.90 | 146.0 | 3.9 | 13.9 | 48 | 9500 | 4.8 | no | no | no | good | no | no | notckd |
| 268 | NaN | 80.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 100.0 | 49.0 | 1.00 | 140.0 | 5.0 | 16.3 | 53 | 8500 | 4.9 | no | no | no | good | no | no | notckd |
| 269 | 25.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 121.0 | 19.0 | 1.20 | 142.0 | 4.9 | 15.0 | 48 | 6900 | 5.3 | no | no | no | good | no | no | notckd |
| 270 | 23.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 111.0 | 34.0 | 1.10 | 145.0 | 4.0 | 14.3 | 41 | 7200 | 5.0 | no | no | no | good | no | no | notckd |
| 271 | 30.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 96.0 | 25.0 | 0.50 | 144.0 | 4.8 | 13.8 | 42 | 9000 | 4.5 | no | no | no | good | no | no | notckd |
| 272 | 56.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 139.0 | 15.0 | 1.20 | 135.0 | 5.0 | 14.8 | 42 | 5600 | 5.5 | no | no | no | good | no | no | notckd |
| 273 | 47.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 95.0 | 35.0 | 0.90 | 140.0 | 4.1 | NaN | 0 | 0 | 0.0 | no | no | no | good | no | no | notckd |
| 274 | 19.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 107.0 | 23.0 | 0.70 | 141.0 | 4.2 | 14.4 | 44 | 0 | 0.0 | no | no | no | good | no | no | notckd |
| 275 | 52.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 125.0 | 22.0 | 1.20 | 139.0 | 4.6 | 16.5 | 43 | 4700 | 4.6 | no | no | no | good | no | no | notckd |
| 276 | 20.0 | 60.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | NaN | NaN | NaN | 137.0 | 4.7 | 14.0 | 41 | 4500 | 5.5 | no | no | no | good | no | no | notckd |
| 277 | 46.0 | 60.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 123.0 | 46.0 | 1.00 | 135.0 | 5.0 | 15.7 | 50 | 6300 | 4.8 | no | no | no | good | no | no | notckd |
| 278 | 48.0 | 60.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 112.0 | 44.0 | 1.20 | 142.0 | 4.9 | 14.5 | 44 | 9400 | 6.4 | no | no | no | good | no | no | notckd |
| 279 | 24.0 | 70.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 140.0 | 23.0 | 0.60 | 140.0 | 4.7 | 16.3 | 48 | 5800 | 5.6 | no | no | no | good | no | no | notckd |
| 280 | 47.0 | 80.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 93.0 | 33.0 | 0.90 | 144.0 | 4.5 | 13.3 | 52 | 8100 | 5.2 | no | no | no | good | no | no | notckd |
| 281 | 55.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 130.0 | 50.0 | 1.20 | 147.0 | 5.0 | 15.5 | 41 | 9100 | 6.0 | no | no | no | good | no | no | notckd |
| 282 | 20.0 | 70.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 123.0 | 44.0 | 1.00 | 135.0 | 3.8 | 14.6 | 44 | 5500 | 4.8 | no | no | no | good | no | no | notckd |
| 283 | 60.0 | 70.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | NaN | NaN | NaN | NaN | NaN | 16.4 | 43 | 10800 | 5.7 | no | no | no | good | no | no | notckd |
| 284 | 33.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 100.0 | 37.0 | 1.20 | 142.0 | 4.0 | 16.9 | 52 | 6700 | 6.0 | no | no | no | good | no | no | notckd |
| 285 | 66.0 | 70.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 94.0 | 19.0 | 0.70 | 135.0 | 3.9 | 16.0 | 41 | 5300 | 5.9 | no | no | no | good | no | no | notckd |
| 286 | 71.0 | 70.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 81.0 | 18.0 | 0.80 | 145.0 | 5.0 | 14.7 | 44 | 9800 | 6.0 | no | no | no | good | no | no | notckd |
| 287 | 39.0 | 70.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 124.0 | 22.0 | 0.60 | 137.0 | 3.8 | 13.4 | 43 | 0 | 0.0 | no | no | no | good | no | no | notckd |
| 288 | 56.0 | 70.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 70.0 | 46.0 | 1.20 | 135.0 | 4.9 | 15.9 | 50 | 11000 | 5.1 | miss | miss | miss | good | no | no | notckd |
| 289 | 42.0 | 70.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 93.0 | 32.0 | 0.90 | 143.0 | 4.7 | 16.6 | 43 | 7100 | 5.3 | no | no | no | good | no | no | notckd |
| 290 | 54.0 | 70.0 | 1.020 | 0.0 | 0.0 | miss | miss | miss | miss | 76.0 | 28.0 | 0.60 | 146.0 | 3.5 | 14.8 | 52 | 8400 | 5.9 | no | no | no | good | no | no | notckd |
| 291 | 47.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 124.0 | 44.0 | 1.00 | 140.0 | 4.9 | 14.9 | 41 | 7000 | 5.7 | no | no | no | good | no | no | notckd |
| 292 | 30.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 89.0 | 42.0 | 0.50 | 139.0 | 5.0 | 16.7 | 52 | 10200 | 5.0 | no | no | no | good | no | no | notckd |
| 293 | 50.0 | NaN | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 92.0 | 19.0 | 1.20 | 150.0 | 4.8 | 14.9 | 48 | 4700 | 5.4 | no | no | no | good | no | no | notckd |
| 294 | 75.0 | 60.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 110.0 | 50.0 | 0.70 | 135.0 | 5.0 | 14.3 | 40 | 8300 | 5.8 | no | no | no | miss | miss | miss | notckd |
| 295 | 44.0 | 70.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 106.0 | 25.0 | 0.90 | 150.0 | 3.6 | 15.0 | 50 | 9600 | 6.5 | no | no | no | good | no | no | notckd |
| 296 | 41.0 | 70.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 125.0 | 38.0 | 0.60 | 140.0 | 5.0 | 16.8 | 41 | 6300 | 5.9 | no | no | no | good | no | no | notckd |
| 297 | 53.0 | 60.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 116.0 | 26.0 | 1.00 | 146.0 | 4.9 | 15.8 | 45 | 7700 | 5.2 | miss | miss | miss | good | no | no | notckd |
| 298 | 34.0 | 60.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 91.0 | 49.0 | 1.20 | 135.0 | 4.5 | 13.5 | 48 | 8600 | 4.9 | no | no | no | good | no | no | notckd |
| 299 | 73.0 | 60.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 127.0 | 48.0 | 0.50 | 150.0 | 3.5 | 15.1 | 52 | 11000 | 4.7 | no | no | no | good | no | no | notckd |
| 300 | 45.0 | 60.0 | 1.020 | 0.0 | 0.0 | normal | normal | miss | miss | 114.0 | 26.0 | 0.70 | 141.0 | 4.2 | 15.0 | 43 | 9200 | 5.8 | no | no | no | good | no | no | notckd |
| 301 | 44.0 | 60.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 96.0 | 33.0 | 0.90 | 147.0 | 4.5 | 16.9 | 41 | 7200 | 5.0 | no | no | no | good | no | no | notckd |
| 302 | 29.0 | 70.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 127.0 | 44.0 | 1.20 | 145.0 | 5.0 | 14.8 | 48 | 0 | 0.0 | no | no | no | good | no | no | notckd |
| 303 | 55.0 | 70.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 107.0 | 26.0 | 1.10 | NaN | NaN | 17.0 | 50 | 6700 | 6.1 | no | no | no | good | no | no | notckd |
| 304 | 33.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 128.0 | 38.0 | 0.60 | 135.0 | 3.9 | 13.1 | 45 | 6200 | 4.5 | no | no | no | good | no | no | notckd |
| 305 | 41.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 122.0 | 25.0 | 0.80 | 138.0 | 5.0 | 17.1 | 41 | 9100 | 5.2 | no | no | no | good | no | no | notckd |
| 306 | 52.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 128.0 | 30.0 | 1.20 | 140.0 | 4.5 | 15.2 | 52 | 4300 | 5.7 | no | no | no | good | no | no | notckd |
| 307 | 47.0 | 60.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 137.0 | 17.0 | 0.50 | 150.0 | 3.5 | 13.6 | 44 | 7900 | 4.5 | no | no | no | good | no | no | notckd |
| 308 | 43.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 81.0 | 46.0 | 0.60 | 135.0 | 4.9 | 13.9 | 48 | 6900 | 4.9 | no | no | no | good | no | no | notckd |
| 309 | 51.0 | 60.0 | 1.020 | 0.0 | 0.0 | miss | miss | notpresent | notpresent | 129.0 | 25.0 | 1.20 | 139.0 | 5.0 | 17.2 | 40 | 8100 | 5.9 | no | no | no | good | no | no | notckd |
| 310 | 46.0 | 60.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 102.0 | 27.0 | 0.70 | 142.0 | 4.9 | 13.2 | 44 | 11000 | 5.4 | no | no | no | good | no | no | notckd |
| 311 | 56.0 | 60.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 132.0 | 18.0 | 1.10 | 147.0 | 4.7 | 13.7 | 45 | 7500 | 5.6 | no | no | no | good | no | no | notckd |
| 312 | 80.0 | 70.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | NaN | NaN | NaN | 135.0 | 4.1 | 15.3 | 48 | 6300 | 6.1 | no | no | no | good | no | no | notckd |
| 313 | 55.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 104.0 | 28.0 | 0.90 | 142.0 | 4.8 | 17.3 | 52 | 8200 | 4.8 | no | no | no | good | no | no | notckd |
| 314 | 39.0 | 70.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 131.0 | 46.0 | 0.60 | 145.0 | 5.0 | 15.6 | 41 | 9400 | 4.7 | no | no | no | good | no | no | notckd |
| 315 | 44.0 | 70.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | NaN | NaN | NaN | NaN | NaN | 13.8 | 48 | 7800 | 4.4 | no | no | no | good | no | no | notckd |
| 316 | 35.0 | NaN | 1.020 | 0.0 | 0.0 | normal | normal | miss | miss | 99.0 | 30.0 | 0.50 | 135.0 | 4.9 | 15.4 | 48 | 5000 | 5.2 | no | no | no | good | no | no | notckd |
| 317 | 58.0 | 70.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 102.0 | 48.0 | 1.20 | 139.0 | 4.3 | 15.0 | 40 | 8100 | 4.9 | no | no | no | good | no | no | notckd |
| 318 | 61.0 | 70.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 120.0 | 29.0 | 0.70 | 137.0 | 3.5 | 17.4 | 52 | 7000 | 5.3 | no | no | no | good | no | no | notckd |
| 319 | 30.0 | 60.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 138.0 | 15.0 | 1.10 | 135.0 | 4.4 | NaN | 0 | 0 | 0.0 | no | no | no | good | no | no | notckd |
| 320 | 57.0 | 60.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 105.0 | 49.0 | 1.20 | 150.0 | 4.7 | 15.7 | 44 | 10400 | 6.2 | no | no | no | good | no | no | notckd |
| 321 | 65.0 | 60.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 109.0 | 39.0 | 1.00 | 144.0 | 3.5 | 13.9 | 48 | 9600 | 4.8 | no | no | no | good | no | no | notckd |
| 322 | 70.0 | 60.0 | NaN | NaN | NaN | miss | miss | notpresent | notpresent | 120.0 | 40.0 | 0.50 | 140.0 | 4.6 | 16.0 | 43 | 4500 | 4.9 | no | no | no | good | no | no | notckd |
| 323 | 43.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 130.0 | 30.0 | 1.10 | 143.0 | 5.0 | 15.9 | 45 | 7800 | 4.5 | no | no | no | good | no | no | notckd |
| 324 | 40.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 119.0 | 15.0 | 0.70 | 150.0 | 4.9 | NaN | 0 | 0 | 0.0 | no | no | no | good | no | no | notckd |
| 325 | 58.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 100.0 | 50.0 | 1.20 | 140.0 | 3.5 | 14.0 | 50 | 6700 | 6.5 | no | no | no | good | no | no | notckd |
| 326 | 47.0 | 60.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 109.0 | 25.0 | 1.10 | 141.0 | 4.7 | 15.8 | 41 | 8300 | 5.2 | no | no | no | good | no | no | notckd |
| 327 | 30.0 | 60.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 120.0 | 31.0 | 0.80 | 150.0 | 4.6 | 13.4 | 44 | 10700 | 5.8 | no | no | no | good | no | no | notckd |
| 328 | 28.0 | 70.0 | 1.020 | 0.0 | 0.0 | normal | normal | miss | miss | 131.0 | 29.0 | 0.60 | 145.0 | 4.9 | NaN | 45 | 8600 | 6.5 | no | no | no | good | no | no | notckd |
| 329 | 33.0 | 60.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 80.0 | 25.0 | 0.90 | 146.0 | 3.5 | 14.1 | 48 | 7800 | 5.1 | no | no | no | good | no | no | notckd |
| 330 | 43.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 114.0 | 32.0 | 1.10 | 135.0 | 3.9 | NaN | 42 | 0 | 0.0 | no | no | no | good | no | no | notckd |
| 331 | 59.0 | 70.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 130.0 | 39.0 | 0.70 | 147.0 | 4.7 | 13.5 | 46 | 6700 | 4.5 | no | no | no | good | no | no | notckd |
| 332 | 34.0 | 70.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | NaN | 33.0 | 1.00 | 150.0 | 5.0 | 15.3 | 44 | 10500 | 6.1 | no | no | no | good | no | no | notckd |
| 333 | 23.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 99.0 | 46.0 | 1.20 | 142.0 | 4.0 | 17.7 | 46 | 4300 | 5.5 | no | no | no | good | no | no | notckd |
| 334 | 24.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 125.0 | NaN | NaN | 136.0 | 3.5 | 15.4 | 43 | 5600 | 4.5 | no | no | no | good | no | no | notckd |
| 335 | 60.0 | 60.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 134.0 | 45.0 | 0.50 | 139.0 | 4.8 | 14.2 | 48 | 10700 | 5.6 | no | no | no | good | no | no | notckd |
| 336 | 25.0 | 60.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 119.0 | 27.0 | 0.50 | NaN | NaN | 15.2 | 40 | 9200 | 5.2 | no | no | no | good | no | no | notckd |
| 337 | 44.0 | 70.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 92.0 | 40.0 | 0.90 | 141.0 | 4.9 | 14.0 | 52 | 7500 | 6.2 | no | no | no | good | no | no | notckd |
| 338 | 62.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 132.0 | 34.0 | 0.80 | 147.0 | 3.5 | 17.8 | 44 | 4700 | 4.5 | no | no | no | good | no | no | notckd |
| 339 | 25.0 | 70.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 88.0 | 42.0 | 0.50 | 136.0 | 3.5 | 13.3 | 48 | 7000 | 4.9 | no | no | no | good | no | no | notckd |
| 340 | 32.0 | 70.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 100.0 | 29.0 | 1.10 | 142.0 | 4.5 | 14.3 | 43 | 6700 | 5.9 | no | no | no | good | no | no | notckd |
| 341 | 63.0 | 70.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 130.0 | 37.0 | 0.90 | 150.0 | 5.0 | 13.4 | 41 | 7300 | 4.7 | no | no | no | good | no | no | notckd |
| 342 | 44.0 | 60.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 95.0 | 46.0 | 0.50 | 138.0 | 4.2 | 15.0 | 50 | 7700 | 6.3 | no | no | no | good | no | no | notckd |
| 343 | 37.0 | 60.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 111.0 | 35.0 | 0.80 | 135.0 | 4.1 | 16.2 | 50 | 5500 | 5.7 | no | no | no | good | no | no | notckd |
| 344 | 64.0 | 60.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 106.0 | 27.0 | 0.70 | 150.0 | 3.3 | 14.4 | 42 | 8100 | 4.7 | no | no | no | good | no | no | notckd |
| 345 | 22.0 | 60.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 97.0 | 18.0 | 1.20 | 138.0 | 4.3 | 13.5 | 42 | 7900 | 6.4 | no | no | no | good | no | no | notckd |
| 346 | 33.0 | 60.0 | NaN | NaN | NaN | normal | normal | notpresent | notpresent | 130.0 | 41.0 | 0.90 | 141.0 | 4.4 | 15.5 | 52 | 4300 | 5.8 | no | no | no | good | no | no | notckd |
| 347 | 43.0 | 60.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 108.0 | 25.0 | 1.00 | 144.0 | 5.0 | 17.8 | 43 | 7200 | 5.5 | no | no | no | good | no | no | notckd |
| 348 | 38.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 99.0 | 19.0 | 0.50 | 147.0 | 3.5 | 13.6 | 44 | 7300 | 6.4 | no | no | no | good | no | no | notckd |
| 349 | 35.0 | 70.0 | 1.025 | 0.0 | 0.0 | miss | miss | notpresent | notpresent | 82.0 | 36.0 | 1.10 | 150.0 | 3.5 | 14.5 | 52 | 9400 | 6.1 | no | no | no | good | no | no | notckd |
| 350 | 65.0 | 70.0 | 1.025 | 0.0 | 0.0 | miss | miss | notpresent | notpresent | 85.0 | 20.0 | 1.00 | 142.0 | 4.8 | 16.1 | 43 | 9600 | 4.5 | no | no | no | good | no | no | notckd |
| 351 | 29.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 83.0 | 49.0 | 0.90 | 139.0 | 3.3 | 17.5 | 40 | 9900 | 4.7 | no | no | no | good | no | no | notckd |
| 352 | 37.0 | 60.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 109.0 | 47.0 | 1.10 | 141.0 | 4.9 | 15.0 | 48 | 7000 | 5.2 | no | no | no | good | no | no | notckd |
| 353 | 39.0 | 60.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 86.0 | 37.0 | 0.60 | 150.0 | 5.0 | 13.6 | 51 | 5800 | 4.5 | no | no | no | good | no | no | notckd |
| 354 | 32.0 | 60.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 102.0 | 17.0 | 0.40 | 147.0 | 4.7 | 14.6 | 41 | 6800 | 5.1 | no | no | no | good | no | no | notckd |
| 355 | 23.0 | 60.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 95.0 | 24.0 | 0.80 | 145.0 | 5.0 | 15.0 | 52 | 6300 | 4.6 | no | no | no | good | no | no | notckd |
| 356 | 34.0 | 70.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 87.0 | 38.0 | 0.50 | 144.0 | 4.8 | 17.1 | 47 | 7400 | 6.1 | no | no | no | good | no | no | notckd |
| 357 | 66.0 | 70.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 107.0 | 16.0 | 1.10 | 140.0 | 3.6 | 13.6 | 42 | 11000 | 4.9 | no | no | no | good | no | no | notckd |
| 358 | 47.0 | 60.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 117.0 | 22.0 | 1.20 | 138.0 | 3.5 | 13.0 | 45 | 5200 | 5.6 | no | no | no | good | no | no | notckd |
| 359 | 74.0 | 60.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 88.0 | 50.0 | 0.60 | 147.0 | 3.7 | 17.2 | 53 | 6000 | 4.5 | no | no | no | good | no | no | notckd |
| 360 | 35.0 | 60.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 105.0 | 39.0 | 0.50 | 135.0 | 3.9 | 14.7 | 43 | 5800 | 6.2 | no | no | no | good | no | no | notckd |
| 361 | 29.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 70.0 | 16.0 | 0.70 | 138.0 | 3.5 | 13.7 | 54 | 5400 | 5.8 | no | no | no | good | no | no | notckd |
| 362 | 33.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 89.0 | 19.0 | 1.10 | 144.0 | 5.0 | 15.0 | 40 | 10300 | 4.8 | no | no | no | good | no | no | notckd |
| 363 | 67.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 99.0 | 40.0 | 0.50 | NaN | NaN | 17.8 | 44 | 5900 | 5.2 | no | no | no | good | no | no | notckd |
| 364 | 73.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 118.0 | 44.0 | 0.70 | 137.0 | 3.5 | 14.8 | 45 | 9300 | 4.7 | no | no | no | good | no | no | notckd |
| 365 | 24.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 93.0 | 46.0 | 1.00 | 145.0 | 3.5 | NaN | 0 | 10700 | 6.3 | no | no | no | good | no | no | notckd |
| 366 | 60.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 81.0 | 15.0 | 0.50 | 141.0 | 3.6 | 15.0 | 46 | 10500 | 5.3 | no | no | no | good | no | no | notckd |
| 367 | 68.0 | 60.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 125.0 | 41.0 | 1.10 | 139.0 | 3.8 | 17.4 | 50 | 6700 | 6.1 | no | no | no | good | no | no | notckd |
| 368 | 30.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 82.0 | 42.0 | 0.70 | 146.0 | 5.0 | 14.9 | 45 | 9400 | 5.9 | no | no | no | good | no | no | notckd |
| 369 | 75.0 | 70.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 107.0 | 48.0 | 0.80 | 144.0 | 3.5 | 13.6 | 46 | 10300 | 4.8 | no | no | no | good | no | no | notckd |
| 370 | 69.0 | 70.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 83.0 | 42.0 | 1.20 | 139.0 | 3.7 | 16.2 | 50 | 9300 | 5.4 | no | no | no | good | no | no | notckd |
| 371 | 28.0 | 60.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 79.0 | 50.0 | 0.50 | 145.0 | 5.0 | 17.6 | 51 | 6500 | 5.0 | no | no | no | good | no | no | notckd |
| 372 | 72.0 | 60.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 109.0 | 26.0 | 0.90 | 150.0 | 4.9 | 15.0 | 52 | 10500 | 5.5 | no | no | no | good | no | no | notckd |
| 373 | 61.0 | 70.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 133.0 | 38.0 | 1.00 | 142.0 | 3.6 | 13.7 | 47 | 9200 | 4.9 | no | no | no | good | no | no | notckd |
| 374 | 79.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 111.0 | 44.0 | 1.20 | 146.0 | 3.6 | 16.3 | 40 | 8000 | 6.4 | no | no | no | good | no | no | notckd |
| 375 | 70.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 74.0 | 41.0 | 0.50 | 143.0 | 4.5 | 15.1 | 48 | 9700 | 5.6 | no | no | no | good | no | no | notckd |
| 376 | 58.0 | 70.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 88.0 | 16.0 | 1.10 | 147.0 | 3.5 | 16.4 | 53 | 9100 | 5.2 | no | no | no | good | no | no | notckd |
| 377 | 64.0 | 70.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 97.0 | 27.0 | 0.70 | 145.0 | 4.8 | 13.8 | 49 | 6400 | 4.8 | no | no | no | good | no | no | notckd |
| 378 | 71.0 | 60.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | NaN | NaN | 0.90 | 140.0 | 4.8 | 15.2 | 42 | 7700 | 5.5 | no | no | no | good | no | no | notckd |
| 379 | 62.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 78.0 | 45.0 | 0.60 | 138.0 | 3.5 | 16.1 | 50 | 5400 | 5.7 | no | no | no | good | no | no | notckd |
| 380 | 59.0 | 60.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 113.0 | 23.0 | 1.10 | 139.0 | 3.5 | 15.3 | 54 | 6500 | 4.9 | no | no | no | good | no | no | notckd |
| 381 | 71.0 | 70.0 | 1.025 | 0.0 | 0.0 | miss | miss | notpresent | notpresent | 79.0 | 47.0 | 0.50 | 142.0 | 4.8 | 16.6 | 40 | 5800 | 5.9 | no | no | no | good | no | no | notckd |
| 382 | 48.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 75.0 | 22.0 | 0.80 | 137.0 | 5.0 | 16.8 | 51 | 6000 | 6.5 | no | no | no | good | no | no | notckd |
| 383 | 80.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 119.0 | 46.0 | 0.70 | 141.0 | 4.9 | 13.9 | 49 | 5100 | 5.0 | no | no | no | good | no | no | notckd |
| 384 | 57.0 | 60.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 132.0 | 18.0 | 1.10 | 150.0 | 4.7 | 15.4 | 42 | 11000 | 4.5 | no | no | no | good | no | no | notckd |
| 385 | 63.0 | 70.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 113.0 | 25.0 | 0.60 | 146.0 | 4.9 | 16.5 | 52 | 8000 | 5.1 | no | no | no | good | no | no | notckd |
| 386 | 46.0 | 70.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 100.0 | 47.0 | 0.50 | 142.0 | 3.5 | 16.4 | 43 | 5700 | 6.5 | no | no | no | good | no | no | notckd |
| 387 | 15.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 93.0 | 17.0 | 0.90 | 136.0 | 3.9 | 16.7 | 50 | 6200 | 5.2 | no | no | no | good | no | no | notckd |
| 388 | 51.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 94.0 | 15.0 | 1.20 | 144.0 | 3.7 | 15.5 | 46 | 9500 | 6.4 | no | no | no | good | no | no | notckd |
| 389 | 41.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 112.0 | 48.0 | 0.70 | 140.0 | 5.0 | 17.0 | 52 | 7200 | 5.8 | no | no | no | good | no | no | notckd |
| 390 | 52.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 99.0 | 25.0 | 0.80 | 135.0 | 3.7 | 15.0 | 52 | 6300 | 5.3 | no | no | no | good | no | no | notckd |
| 391 | 36.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 85.0 | 16.0 | 1.10 | 142.0 | 4.1 | 15.6 | 44 | 5800 | 6.3 | no | no | no | good | no | no | notckd |
| 392 | 57.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 133.0 | 48.0 | 1.20 | 147.0 | 4.3 | 14.8 | 46 | 6600 | 5.5 | no | no | no | good | no | no | notckd |
| 393 | 43.0 | 60.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 117.0 | 45.0 | 0.70 | 141.0 | 4.4 | 13.0 | 54 | 7400 | 5.4 | no | no | no | good | no | no | notckd |
| 394 | 50.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 137.0 | 46.0 | 0.80 | 139.0 | 5.0 | 14.1 | 45 | 9500 | 4.6 | no | no | no | good | no | no | notckd |
| 395 | 55.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 140.0 | 49.0 | 0.50 | 150.0 | 4.9 | 15.7 | 47 | 6700 | 4.9 | no | no | no | good | no | no | notckd |
| 396 | 42.0 | 70.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 75.0 | 31.0 | 1.20 | 141.0 | 3.5 | 16.5 | 54 | 7800 | 6.2 | no | no | no | good | no | no | notckd |
| 397 | 12.0 | 80.0 | 1.020 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 100.0 | 26.0 | 0.60 | 137.0 | 4.4 | 15.8 | 49 | 6600 | 5.4 | no | no | no | good | no | no | notckd |
| 398 | 17.0 | 60.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 114.0 | 50.0 | 1.00 | 135.0 | 4.9 | 14.2 | 51 | 7200 | 5.9 | no | no | no | good | no | no | notckd |
| 399 | 58.0 | 80.0 | 1.025 | 0.0 | 0.0 | normal | normal | notpresent | notpresent | 131.0 | 18.0 | 1.10 | 141.0 | 3.5 | 15.8 | 53 | 6800 | 6.1 | no | no | no | good | no | no | notckd |
# some further cleaning is required to remove the \t characters is a couple of columns replacing the instances with the standard formating
# classification, cad, dm
df_clean['classification'] = df_clean['classification'].replace("ckd\t","ckd")
df_clean['cad'] = df_clean['cad'].replace("\tno","no")
df_clean['dm'] = df_clean['dm'].replace("\tno","no")
df_clean['dm'] = df_clean['dm'].replace("\tyes", "yes")
df_clean['dm'] = df_clean['dm'].replace(" yes", "yes")
# subsetting columns with another boolean mask for categorical columns and object columns
cat_mask_obj2 = (df_clean.dtypes == "object") | (df_clean.dtypes == "category")
# Get list of categorical column names
cat_mask_object2 = df_clean.columns[cat_mask_obj2].tolist()
# remove the column classification
cat_mask_object2.remove("classification")
# see what columns are left
print(cat_mask_object2)
['rbc', 'pc', 'pcc', 'ba', 'htn', 'dm', 'cad', 'appet', 'pe', 'ane']
# look into the XGBoost course to figure out how the categorical imputer works
# combine everything and use DictVectorizer for one hot encoding and label encoding
# conversion of our dataframe into a dictionary so as to use DictVectorizer
# this function is mostly used in text processing
df_dict = df_clean[cat_mask_object2].to_dict("records")
# Make a DictVectorizer: use documentation to learn how it works
# In short, it speeds up one hot encoding with meaningful columns created
# we don't want a sparse matrix right?
dv = DictVectorizer(sparse = False)
# Apply fit_transform to our dataset
df_encoded = dv.fit_transform(df_dict)
# see 10 rows
print (df_encoded[:10,:])
print ("=" * 100) # just formatting to distinguish outputs
# print the vocabulary that is, the columns of the dataset, note that order changes
# upon transformation
print(dv.vocabulary_)
print ("=" * 100) # more formatting
print(df_encoded.shape) # number of rows and columns for the encoded dataset
print(df_clean[cat_mask_object2].shape) # number of rows and columns for the original dataset
print("After doing the transformation the columns increase to 21.")
[[0. 1. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 0. 1. 0. 1. 0.
0. 1. 0. 0. 1. 0.]
[0. 1. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 0. 1. 0. 0. 1. 0. 0. 0. 1. 0. 1. 0.
0. 1. 0. 0. 1. 0.]
[0. 0. 1. 0. 0. 1. 0. 1. 0. 0. 1. 0. 0. 0. 1. 0. 1. 0. 0. 0. 1. 0. 1. 0.
0. 1. 0. 0. 0. 1.]
[0. 0. 1. 0. 0. 1. 0. 1. 0. 0. 1. 0. 0. 1. 0. 0. 0. 1. 1. 0. 0. 0. 0. 1.
0. 0. 1. 0. 0. 1.]
[0. 1. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 0. 1. 0. 0. 1. 0. 0. 0. 1. 0. 1. 0.
0. 1. 0. 0. 0. 1.]
[0. 1. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 1. 0. 0. 1. 0.
0. 0. 1. 0. 1. 0.]
[0. 1. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 0. 1. 0. 0. 1. 0. 0. 0. 1. 0. 1. 0.
0. 1. 0. 0. 1. 0.]
[0. 1. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 1. 0. 1. 0. 1. 0. 0. 0. 1. 0.
0. 0. 1. 0. 0. 1.]
[0. 0. 1. 1. 0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 1. 0. 0. 1. 1. 0. 0. 0. 0. 1.
0. 1. 0. 0. 0. 1.]
[0. 0. 1. 0. 0. 1. 0. 1. 0. 0. 1. 0. 0. 0. 1. 0. 0. 1. 1. 0. 0. 0. 0. 1.
0. 1. 0. 1. 0. 0.]]
====================================================================================================
{'rbc=miss': 28, 'pc=normal': 20, 'pcc=notpresent': 22, 'ba=notpresent': 7, 'htn=yes': 17, 'dm=yes': 14, 'cad=no': 10, 'appet=good': 3, 'pe=no': 25, 'ane=no': 1, 'htn=no': 16, 'dm=no': 13, 'rbc=normal': 29, 'appet=poor': 5, 'ane=yes': 2, 'pc=abnormal': 18, 'pcc=present': 23, 'pe=yes': 26, 'pc=miss': 19, 'rbc=abnormal': 27, 'cad=yes': 11, 'ba=present': 8, 'htn=miss': 15, 'dm=miss': 12, 'cad=miss': 9, 'pcc=miss': 21, 'ba=miss': 6, 'appet=miss': 4, 'pe=miss': 24, 'ane=miss': 0}
====================================================================================================
(400, 30)
(400, 10)
After doing the transformation the columns increase to 21.
# You can try
# make a pipeline to merge the encoding as well as the visualization
# Use t-SNE and or PCA to see the differences between groups this will be the EDA step
# make a train and test split go through the slides for how to win kaggle competitions and test the ideas
# make the next step a pipeline object like in the xgboost course and try random forest, xgboost and decision tree classifier
# later use the ensembling techniques: Try all the ensembling techniques you know.
# see the transformed dataframe with all the missing values imputed
df_clean[cat_mask_numeric]
| age | bp | sg | al | su | bgr | bu | sc | sod | pot | hemo | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| id | |||||||||||
| 0 | 48.0 | 80.0 | 1.020 | 1.0 | 0.0 | 121.0 | 36.0 | 1.2 | 0.0 | 0.0 | 15.4 |
| 1 | 7.0 | 50.0 | 1.020 | 4.0 | 0.0 | 0.0 | 18.0 | 0.8 | 0.0 | 0.0 | 11.3 |
| 2 | 62.0 | 80.0 | 1.010 | 2.0 | 3.0 | 423.0 | 53.0 | 1.8 | 0.0 | 0.0 | 9.6 |
| 3 | 48.0 | 70.0 | 1.005 | 4.0 | 0.0 | 117.0 | 56.0 | 3.8 | 111.0 | 2.5 | 11.2 |
| 4 | 51.0 | 80.0 | 1.010 | 2.0 | 0.0 | 106.0 | 26.0 | 1.4 | 0.0 | 0.0 | 11.6 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 395 | 55.0 | 80.0 | 1.020 | 0.0 | 0.0 | 140.0 | 49.0 | 0.5 | 150.0 | 4.9 | 15.7 |
| 396 | 42.0 | 70.0 | 1.025 | 0.0 | 0.0 | 75.0 | 31.0 | 1.2 | 141.0 | 3.5 | 16.5 |
| 397 | 12.0 | 80.0 | 1.020 | 0.0 | 0.0 | 100.0 | 26.0 | 0.6 | 137.0 | 4.4 | 15.8 |
| 398 | 17.0 | 60.0 | 1.025 | 0.0 | 0.0 | 114.0 | 50.0 | 1.0 | 135.0 | 4.9 | 14.2 |
| 399 | 58.0 | 80.0 | 1.025 | 0.0 | 0.0 | 131.0 | 18.0 | 1.1 | 141.0 | 3.5 | 15.8 |
400 rows × 11 columns
# simply taking the vectorized columns and the numeric columns and bringing them together
# to make an array for a classifier
concat_cols = np.hstack((df_encoded, df_clean[cat_mask_numeric].values))
# another version that is in dataframe format
# make a dataframe with the encoded features and give the columns names from the dictVectorizer
df_cat_var = pd.DataFrame(df_encoded, columns=dv.get_feature_names_out())
# combine the columns together with the categorical features i.e add columns to the numerical dataframe with other dataframe with the categorical and object data types
concat_cols_df = pd.concat([df_clean[cat_mask_numeric], df_cat_var], axis=1)
concat_cols.shape
(400, 41)
# the final dataframe we'll use for classification
concat_cols_df
| age | bp | sg | al | su | bgr | bu | sc | sod | pot | ... | pc=normal | pcc=miss | pcc=notpresent | pcc=present | pe=miss | pe=no | pe=yes | rbc=abnormal | rbc=miss | rbc=normal | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 48.0 | 80.0 | 1.020 | 1.0 | 0.0 | 121.0 | 36.0 | 1.2 | 0.0 | 0.0 | ... | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| 1 | 7.0 | 50.0 | 1.020 | 4.0 | 0.0 | 0.0 | 18.0 | 0.8 | 0.0 | 0.0 | ... | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| 2 | 62.0 | 80.0 | 1.010 | 2.0 | 3.0 | 423.0 | 53.0 | 1.8 | 0.0 | 0.0 | ... | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| 3 | 48.0 | 70.0 | 1.005 | 4.0 | 0.0 | 117.0 | 56.0 | 3.8 | 111.0 | 2.5 | ... | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 |
| 4 | 51.0 | 80.0 | 1.010 | 2.0 | 0.0 | 106.0 | 26.0 | 1.4 | 0.0 | 0.0 | ... | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 395 | 55.0 | 80.0 | 1.020 | 0.0 | 0.0 | 140.0 | 49.0 | 0.5 | 150.0 | 4.9 | ... | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| 396 | 42.0 | 70.0 | 1.025 | 0.0 | 0.0 | 75.0 | 31.0 | 1.2 | 141.0 | 3.5 | ... | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| 397 | 12.0 | 80.0 | 1.020 | 0.0 | 0.0 | 100.0 | 26.0 | 0.6 | 137.0 | 4.4 | ... | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| 398 | 17.0 | 60.0 | 1.025 | 0.0 | 0.0 | 114.0 | 50.0 | 1.0 | 135.0 | 4.9 | ... | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| 399 | 58.0 | 80.0 | 1.025 | 0.0 | 0.0 | 131.0 | 18.0 | 1.1 | 141.0 | 3.5 | ... | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 |
400 rows × 41 columns
# this operation is not necessary but it's good to see the final dataframe
#pd.set_option('future.no_silent_downcasting', True)
# now get the target variable into a numeric form
# there's a simpler step where you can use map instead y = df_clean["classification"].map(lambda val1: 1 if val1 == "ckd" else 0)
# y = y.values
col_preprocess = df_clean["classification"].replace("ckd", 1)
final_col_preprocess = col_preprocess.replace("notckd", 0)
y = final_col_preprocess.values
print(y)
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
/tmp/ipykernel_814016/899742435.py:5: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
final_col_preprocess = col_preprocess.replace("notckd", 0)
# confirm if the shape of the vector and matrix
print(concat_cols.shape)
print(y.shape)
(400, 41) (400,)
# now that we have both matrices we can see the distribution of the target variable to know what to do next
# when it comes to preprocessing
final_col_preprocess.reset_index()["classification"].value_counts(normalize=True)
classification 1 0.625 0 0.375 Name: proportion, dtype: float64
Split the dataframe to 3 for more training and compare result with the same configuration set¶
# 3 split to evaluate model performance: do this to evaluate models better
features_train, features_validation_test, labels_train, labels_validation_test = train_test_split(concat_cols_df, y, test_size=0.4, random_state=100)
features_validation, features_test, labels_validation, labels_test = train_test_split( features_validation_test, labels_validation_test, test_size=0.5, random_state=100)
# The patients with chronic kidney disease are more than those who don't have .63 ckd and .38 for nonckd
# The dataset is imbalanced, we can't use the regular accuracy as an evaluation metric instead confusion matrix and F1 score
# we don't want more of the train set in either so I guess stratify is good option
# I changed the 0.5:0.5 to 0.75:0.25
x_train, x_test, y_train, y_test = train_test_split(concat_cols_df, y, test_size = 0.25, stratify = y, random_state=1243)
# Check if the dimensionality is the same for the feature and target set (train)
print("Is the number of rows the same between the features and the target?")
assert x_train.shape[0] == y_train.shape[0]
print (True)
Is the number of rows the same between the features and the target? True
# Check if the dimensionality is the same for the feature and target set (test)
print("Is the number of rows the same between the features and the target?")
assert x_test.shape[0] == y_test.shape[0]
print (True)
Is the number of rows the same between the features and the target? True
# Now checking if the target variable is balanced in the train set
pd.Series(y_train).value_counts(normalize=True)
1 0.623333 0 0.376667 Name: proportion, dtype: float64
# They are still unbalanced now. Therefore, will have to use the f1 score and change the class weight of the algorithms used like logistic regression
# look at the instances of the labels 0 and 1
pd.Series(y_test).value_counts(normalize=True)
1 0.63 0 0.37 Name: proportion, dtype: float64
# convert all the target variables to integers
y_train = y_train.astype(int)
y_test = y_test.astype(int)
# normal scikit learn paradigm of specify, fit and predict for logistic regression or continuos perceptron
clf_lr1 = LogisticRegression(class_weight="balanced", random_state=1243, max_iter=1000)
clf_lr1.fit(x_train,y_train)
preds1 = clf_lr1.predict(x_test)
# using f1 score instead of other metrics
score_vote1 = f1_score(preds1, y_test)
print('F1-Score: {:.3f}'.format(score_vote1))
# Calculate the classification report
report1 = classification_report(y_test, preds1,target_names=["notckd", "ckd"])
print(report1)
F1-Score: 0.992
precision recall f1-score support
notckd 0.97 1.00 0.99 37
ckd 1.00 0.98 0.99 63
accuracy 0.99 100
macro avg 0.99 0.99 0.99 100
weighted avg 0.99 0.99 0.99 100
# Specify the Decision tree classifier: asks a bunch of if-else statements to come up with a decision
# adjust the number of min samples leaf based on the game 20 questions (From deep learning for coders by Jeremy Howard)
clf_dt2 = DecisionTreeClassifier(class_weight = "balanced",random_state=1243)
clf_dt2.fit(x_train,y_train)
preds2 = clf_dt2.predict(x_test)
score_vote2 = f1_score(preds2, y_test)
print('F1-Score: {:.3f}'.format(score_vote2))
# Calculate the classification report
report2 = classification_report(y_test, preds2, target_names=["notckd", "ckd"])
print(report2)
F1-Score: 0.959
precision recall f1-score support
notckd 0.88 1.00 0.94 37
ckd 1.00 0.92 0.96 63
accuracy 0.95 100
macro avg 0.94 0.96 0.95 100
weighted avg 0.96 0.95 0.95 100
# check the parameters of the decision tree classifier in new lines to see what you can change
clf_dt2.get_params().keys()
dict_keys(['ccp_alpha', 'class_weight', 'criterion', 'max_depth', 'max_features', 'max_leaf_nodes', 'min_impurity_decrease', 'min_samples_leaf', 'min_samples_split', 'min_weight_fraction_leaf', 'monotonic_cst', 'random_state', 'splitter'])
# normalize data functions
# log, z scores(standardized_data), dimensionality reduction with PCA(dim_reduction) and making a function that combines another function(compose2)
def skew (data):
skewed_data = np.log(data)
return skewed_data
def standardized_data(data):
scaler = StandardScaler()
scaler.fit(data)
scaled_data = scaler.transform(data)
return scaled_data
def dim_reduction(data):
pca = PCA(n_components=2)
return pca.fit_transform(data)
def compose2(f, g):
return lambda *a, **kw: f(g(*a, **kw))
def compose(*fs):
return reduce(compose2, fs)
# returns an error due to the points being too small
normalize_data = compose2(skew, standardized_data)
# mean of 0 and a std 1 for all the columns
scaled_x_train = standardized_data(x_train)
scaled_x_test = standardized_data(x_test)
# add for the smaller datasets
scaled_features_validation = standardized_data(features_validation)
scaled_features_test = standardized_data(features_test)
# reduce dimensionality to 2
transform_data = compose2(standardized_data, dim_reduction)
dim_red_x_train = transform_data(x_train)
dim_red_x_test = transform_data(x_test)
# Logistic regression just a modified perceptron algorithm that uses the sigmoid function therefore an continous perceptron
clf_lr3 = LogisticRegression(class_weight="balanced",random_state=1243)
clf_lr3.fit(scaled_x_train,y_train)
preds3 = clf_lr3.predict(scaled_x_test)
score_vote3 = f1_score(preds3, y_test)
print('F1-Score: {:.3f}'.format(score_vote3))
# Calculate the classification report
report3= classification_report(y_test, preds3,target_names=["notckd", "ckd"])
print(report3)
F1-Score: 0.976
precision recall f1-score support
notckd 0.93 1.00 0.96 37
ckd 1.00 0.95 0.98 63
accuracy 0.97 100
macro avg 0.96 0.98 0.97 100
weighted avg 0.97 0.97 0.97 100
# Logistic regression but the features have been compressed
clf_lr5 = LogisticRegression(class_weight="balanced",random_state=1243)
clf_lr5.fit(dim_red_x_train,y_train)
preds5 = clf_lr5.predict(dim_red_x_test)
score_vote5 = f1_score(preds5, y_test)
print('F1-Score: {:.3f}'.format(score_vote5))
# Make a classification report
report5 = classification_report(y_test, preds5,target_names=["notckd", "ckd"])
print(report5)
F1-Score: 0.603
precision recall f1-score support
notckd 0.40 0.51 0.45 37
ckd 0.66 0.56 0.60 63
accuracy 0.54 100
macro avg 0.53 0.53 0.53 100
weighted avg 0.57 0.54 0.55 100
# Decision tree classifier but with scaled features
clf_dt4 = DecisionTreeClassifier(class_weight="balanced", random_state=1243, min_samples_leaf=25)
clf_dt4.fit(scaled_x_train,y_train)
preds4 = clf_dt4.predict(scaled_x_test)
score_vote4 = f1_score(preds4, y_test)
print('F1-Score: {:.3f}'.format(score_vote4))
# Make a classification report
report4 = classification_report(y_test, preds4, target_names=["notckd", "ckd"],)
print(report4)
F1-Score: 0.879
precision recall f1-score support
notckd 0.74 0.95 0.83 37
ckd 0.96 0.81 0.88 63
accuracy 0.86 100
macro avg 0.85 0.88 0.86 100
weighted avg 0.88 0.86 0.86 100
# take the coefficient and see the dimension
clf_lr1.coef_.shape
(1, 41)
To do: Tests for overfitting¶
# helps with visualizing the decision function for the classifier
def plot_points(features, labels):
'''
'''
X = np.array(features) # convert data into an numpy array: features
y = np.array(labels) # convert data into an numpy array: labels
ckd = X[np.argwhere(y==1)] # get all instances where the features are for individuals with ckd
notckd = X[np.argwhere(y==0)] # get all instances where the features are for individuals without ckd
plt.scatter([s[0][0] for s in ckd],
[s[0][1] for s in ckd],
s = 30,
color = 'cyan',
edgecolor = 'k',
marker = '^')
plt.scatter([s[0][0] for s in notckd],
[s[0][1] for s in notckd],
s = 30,
color = 'red',
edgecolor = 'k',
marker = 's')
plt.xlabel('aack')
plt.ylabel('beep')
plt.legend(['ckd','notckd'])
def draw_line(a,b,c, color='black', linewidth=2.0, linestyle='solid', starting=0, ending=3):
# Plotting the line ax + by + c = 0
x = np.linspace(starting, ending, 1000)
plt.plot(x, -c/b - a*x/b, linestyle=linestyle, color=color, linewidth=linewidth)
# Trying to visualize the function but this didn't work so well
X = np.array(concat_cols)
y = np.array(y)
ckd = X[np.argwhere(y==0)]
notckd = X[np.argwhere(y==1)]
plt.scatter([s[0][0] for s in ckd],
[s[0][1] for s in ckd],
s = 25,
color = 'cyan',
edgecolor = 'k',
marker = '^')
plt.scatter([s[0][0] for s in notckd],
[s[0][1] for s in notckd],
s = 25,
color = 'red',
edgecolor = 'k',
marker = 's')
plt.xlabel('ckd')
plt.ylabel('notckd')
plt.legend(['ckd','notckd'])
<matplotlib.legend.Legend at 0x7576f7874430>
# This needs some fixing: Please ignore this for now.
plot_points(scaled_x_train, y_train)
draw_line(1,1, clf_lr1.fit_intercept)
# Check this out https://github.com/luisguiserrano/manning/blob/master/Chapter%205%20-%20Logistic%20Regression/Coding%20the%20Logistic%20Regression%20Algorithm.ipynb
%matplotlib inline
# plotting feature importance for the Decision tree
# grab the column names as a list
features = concat_cols_df.columns
# get the feature importances
important_features = clf_dt2.feature_importances_
# find the indices of a sorted array
feature_indices = np.argsort(important_features)
# make a plot
plt.title('Feature Importances Decision Tree')
plt.xticks(fontsize=6, rotation = 45)
plt.barh(range(len(feature_indices)), important_features[feature_indices], color='g', align='center')
plt.yticks(range(len(feature_indices)), [features[i] for i in feature_indices], fontsize = 6)
plt.xlabel('Relative Importance')
plt.show()
# Reviewing feature importance using the logistic regression and the C parameter
# grab the coefficients and transpose the array
# label the C parameter
plt.plot(np.sort(clf_lr1.coef_.T), 'o', label="C=1",color = "g")
plt.xticks(range(concat_cols_df.shape[1]), concat_cols_df.columns, rotation=90)
plt.hlines(0, 0, concat_cols_df.shape[1])
plt.title("Examination of feature importance")
plt.xlabel("Coefficient index")
plt.ylabel("Coefficient magnitude")
plt.legend()
<matplotlib.legend.Legend at 0x73368de40610>
# Draw a feature importance plot for the logistic regression in the same way
plt.title('Feature Importances Logistic Regression')
plt.xticks(fontsize=6, rotation = 45)
plt.barh(range(len(feature_indices)), clf_lr1.coef_[0][feature_indices], color='g', align='center')
plt.yticks(range(len(feature_indices)), [features[i] for i in feature_indices], fontsize = 6)
plt.xlabel('Relative Importance')
plt.show()
#!pip install pydotplus -q
# draw the decision tree
# add more comments for this
import six
from IPython.display import Image
from sklearn.tree import export_graphviz
import pydotplus
dot_data = six.StringIO()
export_graphviz(clf_dt2, out_file=dot_data,
filled=True, rounded=True,
special_characters=True, feature_names = concat_cols_df.columns, class_names =["notckd", "ckd"])
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
Image(graph.create_png())
# look at hemo(hemoglobin), sg(specific gravity), al(albumin), sod(sodium), rbc=normal(red blood cells), htn=yes(hypertension), bu(blood urea)
# dm (diabetes mellitus)
Let's see the plot abit differently in markdown format to help us export this work to a report or maybe attach it to Data Version Control with less effort.
from sklearn.tree import export_text
rules = export_text(clf_dt2, feature_names=list(concat_cols_df.columns))
print(rules)
|--- hemo <= 12.85 | |--- sod <= 143.50 | | |--- sg <= 1.02 | | | |--- class: 1 | | |--- sg > 1.02 | | | |--- hemo <= 2.90 | | | | |--- rbc=normal <= 0.50 | | | | | |--- pcc=present <= 0.50 | | | | | | |--- class: 1 | | | | | |--- pcc=present > 0.50 | | | | | | |--- class: 1 | | | | |--- rbc=normal > 0.50 | | | | | |--- class: 0 | | | |--- hemo > 2.90 | | | | |--- class: 1 | |--- sod > 143.50 | | |--- age <= 43.00 | | | |--- class: 0 | | |--- age > 43.00 | | | |--- pot <= 4.35 | | | | |--- class: 1 | | | |--- pot > 4.35 | | | | |--- class: 1 |--- hemo > 12.85 | |--- sg <= 1.02 | | |--- sg <= 0.50 | | | |--- htn=yes <= 0.50 | | | | |--- class: 0 | | | |--- htn=yes > 0.50 | | | | |--- class: 1 | | |--- sg > 0.50 | | | |--- class: 1 | |--- sg > 1.02 | | |--- dm=yes <= 0.50 | | | |--- class: 0 | | |--- dm=yes > 0.50 | | | |--- class: 1
Here we see the decision tree in another representation that could save memory especially if working with data version control systems. It is like running a profiler on the data. We can see that the decision tree is not overfitting. The model is able to classify the data well.
# pertubation test
# this is a test to see how robust the model is
from sklearn.inspection import permutation_importance
# this is a test to see how robust the model is
result = permutation_importance(clf_dt2, x_test, y_test, n_repeats=10, random_state=1243)
sorted_idx = result.importances_mean.argsort()
plt.barh(range(len(sorted_idx)), result.importances_mean[sorted_idx], color='g', align='center')
plt.yticks(range(len(sorted_idx)), [features[i] for i in sorted_idx], fontsize = 6)
plt.xlabel('Relative Importance')
plt.title('Permutation Importances Decision Tree')
Text(0.5, 1.0, 'Permutation Importances Decision Tree')
With the permutation test we are trying to find out we are trying to add noise to the data by randomly shuffling the data while keeping others constant. This helps us evaluate if we will have the same features as importance. Think of it as doing the experiment multiple times. The top features are retained: hemo (hemogloblin), dm(diabetes mellitus) and sp (specific gravity). The rbc=normal seems to be downregulating the prediction and hence, it is safe to remove it. However, maybe the dataset shuffle might have affected the results. Now, lets estimate the shap values.
# using shap values to compare the feature importance
# if they are stable
import shap
from sklearn.metrics.pairwise import cosine_similarity
# calculate shap values for logistic regression
# Wrap the model's predict_proba method in a callable function
def model_predict_proba(X):
return clf_lr1.predict_proba(X)
# Create the SHAP explainer using the callable function
explainer = shap.Explainer(model_predict_proba, x_train)
shap_values = explainer(x_train)
# summarize the effects of all the features (reduce to 2D by averaging over the class dimension)
feature_imp_mod1 = np.mean(np.abs(shap_values.values).sum(axis=1), axis=0)
# calculate shap values for decision tree
explainer2 = shap.Explainer(clf_dt2)
shap_values2 = explainer2.shap_values(x_train)
# summarize the effects of all the features (reduce to 2D by averaging over the class dimension)
feature_imp_mod2 = np.mean(np.abs(shap_values2).sum(axis=1), axis=0)
# consistency calculation
consistency = cosine_similarity([feature_imp_mod1], [feature_imp_mod2])
print("The consistency of the two models is: ", consistency)
The consistency of the two models is: [[1.]]
Shap values help us understand exactly how the model is making predictions. It helps us understand the features that are important in making the prediction. We have attempted to compare both model predictions and generated shap values. The cosine similiarity between the the logistic regression and the decision tree are 1 this means the features being taken into consideration are the same. In addition, the features have robust predictive power regardless of model choice. Which is an unusual result.
# Calculating model faitfulness by changing features and seeing if predictions change
try_pred = clf_dt2.predict(x_test)
# check the predictions
print("Original predictions: ", try_pred)
# changing the features on hemo comparing versus original
# ranges between 3.1 to 17.8 decisively 12.85
x_test2 = x_test.copy()
x_test2.iloc[:, 0] = 12.85
# check the predictions
preds2 = clf_dt2.predict(x_test2)
# compare the predictions with the actual y_test values
# Calculate cosine similarity between original predictions and actual y_test
similarity_original = cosine_similarity([try_pred], [y_test])
# Calculate cosine similarity between modified predictions and actual y_test
similarity_modified = cosine_similarity([preds2], [y_test])
print("Cosine similarity (original predictions vs actual): ", similarity_original[0][0])
print("Cosine similarity (modified predictions vs actual): ", similarity_modified[0][0])
Original predictions: [1 0 1 0 1 1 1 1 1 1 0 1 1 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 0 0 1 0 0 0 0 1 1 0 1 0 1 0 1 0 1 1 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 1 1 0 0 1 1 1 1 0 1 1 1 0 1 0 1 0 0 0 1 0 0 0 1 1 0 1 1 1 1 1 1 0] Cosine similarity (original predictions vs actual): 0.9594972228385658 Cosine similarity (modified predictions vs actual): 0.9428090415820634
/tmp/ipykernel_814016/3853791278.py:11: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '12.85' has dtype incompatible with float32, please explicitly cast to a compatible dtype first. x_test2.iloc[:, 0] = 12.85
Even trying to change the features of the model. we are still getting more or less the same prediction. This model is not overfitting and indeed, has robust predictive power. Now, we try and interpret the data based on medical knowledge we have:
At the start if the hemoglobin levels were less than or equal to 12.85 on average the samples 300 -- this is every sample of the dataset is split to the left side and hemoglobin levels are less than 12.85 then the level of specific gravity is considered. These features are both looked into the URI strip test done in hospitals can be done at home too.
To the left we now see sodium on average less than or equal to 143.5 being considered. Moving along we see that sodium and specific gravity being considered. This also makes sense especially hemoglobin and specific gravity which are part of the tests done on the kidney function test in a URI strip. Sodium and Potassium are reabsorbed mostly by the kidney but here some is seen maybe they are some other conditions the patient is suffering from. Age less than or equal to 43.2 on average for some patients did not have chronic kidney disease whereas, three had it and their age surpassed 43 to a small extent age can be considered as determining factor. Staying on the left side of the tree we see more patients being classified as having chronic disease as an example after the division by specific gravity 141 samples were placed there without any further subdivision -- 141 in number. On the other hand, for those whose hemoglobin levels mean were greater than to 2.9, 21 samples whose samples but the there's something interesting going on since the gini index (measure of diversity in a set) is negative, normally 0 means that the same is pure and you'll find only those with chronic kidney disease here. But on the other side, where rbc=normal, the condition is true that is, the patients have less rbc=normal and the hemoglobin is less than 2.9, we see that 4 patients were placed in group of having chronic kidney disease and the other 2 didn't have it completing the left side of the tree.
Let's move to the False subdivision where the hemoglobin for patients in this group was more than 12.85 at the root of the tree. In this group, the specific gravity is still being considered whose mean value 1.017 -- this is normal range if using a URI strip test. But later there's a subdivision less than 0.505, this is very low beyond the normal threshold and as you can see 16 patients were classified as having chronic kidney disease. Another branch looks at if the patient has diabetes mellitus (pancreas cells don't release enough insulin or insulin resistance) less than half of the patients, 5 were classified as not having chronic kidney disease and one had chronic kidney disease. Normally a consequence of diabetes mellitus is chronic kidney disease due to the predisposing symptoms of diabetes mellitus.
In the last branch, to the further right we see samples moving from a specific gravity here the specific gravity is greater than 1.017 this is also beyond in the normal threshold then the dm=yes where there are 105 samples, the specific gravity was in the normal range to my knowledge and 104 samples were classified as not having chronic kidney disease and lastly one had chronic kidney disease in the sample.
Most of the leaf nodes have a gini index of 0.0 this means the elements in the sample had one of that class in the leaf node this shows that some of the features selected by this decision tree could be really good features to be looked into during diagnosis or progression of a disease to know if the patients could have chronic, acute kidney disease as well as looking at other features like we'll look into when looking at the Logisitic regression classifier. However, the gini index -0.0 is still worrisome because it doesn't make a lot of sense negative 0? it could be a bug in the decision tree implementation or something else. You can find out what's up with that? trying a random forest would be interesting too because the feature dm=yes which means the patient was the group with diabetes mellitus or not seems to be reemerging at some points of the tree.
# interpreting the logistic regression model
clf_lr1.predict(x_test[:1])
array([1])
# checking out the intercept this means that if every feature is 0 what the prediction would be
clf_lr1.intercept_
array([5.76320108])
# checking out the coefficients we'll multiply this by the every value that corresponds to that feature e.g clf_lr1.coef[0] * age[0]
clf_lr1.coef_
array([[ 0.0063346 , -0.01842495, 0.22342211, 1.32314306, 0.32025324,
0.01268437, -0.00690186, 1.32353171, -0.02486324, -0.20097746,
-0.23277008, -0.0221892 , -0.16596807, 0.19124526, -0.26965719,
-0.0221892 , 0.29493439, -0.01686371, -0.13511728, 0.15506898,
-0.02987156, -0.06799412, 0.10095367, -0.02987156, -0.88891432,
0.92187387, -0.02987156, -0.62832756, 0.66128711, 0.55442474,
-0.44919695, -0.1021398 , -0.01686371, 0.00382733, 0.01612436,
-0.0221892 , -0.38755198, 0.41282917, 0.08652115, 1.7289188 ,
-1.81235197]])
# checking out the available columns
concat_cols_df.columns
Index(['age', 'bp', 'sg', 'al', 'su', 'bgr', 'bu', 'sc', 'sod', 'pot', 'hemo',
'ane=miss', 'ane=no', 'ane=yes', 'appet=good', 'appet=miss',
'appet=poor', 'ba=miss', 'ba=notpresent', 'ba=present', 'cad=miss',
'cad=no', 'cad=yes', 'dm=miss', 'dm=no', 'dm=yes', 'htn=miss', 'htn=no',
'htn=yes', 'pc=abnormal', 'pc=miss', 'pc=normal', 'pcc=miss',
'pcc=notpresent', 'pcc=present', 'pe=miss', 'pe=no', 'pe=yes',
'rbc=abnormal', 'rbc=miss', 'rbc=normal'],
dtype='object')
# make a dataframe to help easily grab the coefficients for writing the formula and visualizing the data
# transpose to see it clearly
important_features2 = clf_lr1.coef_[0]
column_coef = pd.DataFrame(list(zip(important_features2.T.ravel("C").tolist(), features)),columns = ["coefficient", "feature"])
column_coef["coefficient"] = column_coef["coefficient"].astype("float32")
column_coef.T
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| coefficient | 0.006335 | -0.018425 | 0.223422 | 1.323143 | 0.320253 | 0.012684 | -0.006902 | 1.323532 | -0.024863 | -0.200977 | ... | -0.10214 | -0.016864 | 0.003827 | 0.016124 | -0.022189 | -0.387552 | 0.412829 | 0.086521 | 1.728919 | -1.812352 |
| feature | age | bp | sg | al | su | bgr | bu | sc | sod | pot | ... | pc=normal | pcc=miss | pcc=notpresent | pcc=present | pe=miss | pe=no | pe=yes | rbc=abnormal | rbc=miss | rbc=normal |
2 rows × 41 columns
# arrange the coeffiecients in descending order to know the most likely features
column_coef.sort_values(by=["coefficient"], axis = 0, inplace=True, ascending=False)
print(column_coef.head(10))
coefficient feature 39 1.728919 rbc=miss 7 1.323532 sc 3 1.323143 al 25 0.921874 dm=yes 28 0.661287 htn=yes 29 0.554425 pc=abnormal 37 0.412829 pe=yes 4 0.320253 su 16 0.294934 appet=poor 2 0.223422 sg
missing rbcs was a most important feature. I will not take that since it mostly has missing values in fact 152 sample patient results are missing. I think the best course of action is to get the data from the patients about RBCs or drop the column entirely for modelling -- there's an issue with that rationale since some patients may not come back or can't afford to do the test. al - albumin levels, sc - serum creatinine, dm=yes the patient having diabetes mellitus and htn - hypertension these are all crucial kidney function tests or predisposing features of a patient that could be having chronic kidney disease based on my background. The additional features that have been highlighted are also important but have lower values as compared to the ones mentioned prior.
In a more technical perspective these features make sense for instance the albumin which is a large protein that is not supposed to pass through as a glomerular filtrate in the proximal convoluted tubule to the urine since the patient could have a high blood pressure yet another predisposing feature that could fuel kidney diseases either acutely or chronicly to the patients being classified as having chronic kidney disease at least for those features. How does hypertension do it? If the blood pressure is high according to the anatomy of the kidney we'll see a faster rate of filtration and the pressure may damage the glomerulus. Imagine trying to use a sieve with a fast flowing liquid and particules just a bit larger than the sieve. Over a long period of time some of those particles may pass through.
x_test[:1] # see all the features in the first column
| age | bp | sg | al | su | bgr | bu | sc | sod | pot | ... | pc=normal | pcc=miss | pcc=notpresent | pcc=present | pe=miss | pe=no | pe=yes | rbc=abnormal | rbc=miss | rbc=normal | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 147 | 60.0 | 60.0 | 1.01 | 3.0 | 1.0 | 288.0 | 36.0 | 1.7 | 130.0 | 3.0 | ... | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 |
1 rows × 41 columns
x_test.shape # see the number of rows and columns
(100, 41)
# writing the logistic regression formula
# Bo intercept, B1 x1n
# writing the denominator based on the wikipedia entry https://en.wikipedia.org/wiki/Logistic_regression
# 1/ np.exp(-(weight * coefn + clf_lr1.intercept_))
weights_int_bias = clf_lr1.intercept_ + (column_coef.coefficient[0] * 60.0) + (column_coef.coefficient[1] * 60.0) + (column_coef.coefficient[2] * 1.01) + (column_coef.coefficient[3] + 3.0) + (column_coef.coefficient[4] + 1.0) + (column_coef.coefficient[5] * 288) + (column_coef.coefficient[6] * 36.0) + (column_coef.coefficient[7] * 1.7) + (column_coef.coefficient[8] * 130) + (column_coef.coefficient[9] * 3.0) + (column_coef.coefficient[10] * 7.9) + (column_coef.coefficient[11] * 0.0) + (column_coef.coefficient[12] * 0.0) + (column_coef.coefficient[13] * 1.0) + (column_coef.coefficient[14] * 0.0) + (column_coef.coefficient[15] * 0.0) + (column_coef.coefficient[16] * 1.0) + (column_coef.coefficient[17] * 0.0) + (column_coef.coefficient[18] * 1.0) + (column_coef.coefficient[19] * 0.0) + (column_coef.coefficient[20] * 0.0) + (column_coef.coefficient[21] * 1.0) + (column_coef.coefficient[22] * 0.0) + (column_coef.coefficient[23] * 0.0) + (column_coef.coefficient[24] * 1.0) + (column_coef.coefficient[25] * 0.0) + (column_coef.coefficient[26] * 0.0) + (column_coef.coefficient[27] * 1.0) + (column_coef.coefficient[28] * 1.0) + (column_coef.coefficient[29] * 0.0) + (column_coef.coefficient[30] * 0.0) + (column_coef.coefficient[31] * 0.0) + (column_coef.coefficient[32] * 0.0) + (column_coef.coefficient[33] * 1.0) + (column_coef.coefficient[34] * 0.0) + (column_coef.coefficient[35] * 0.0) + (column_coef.coefficient[36] * 0.0) + (column_coef.coefficient[37] * 0.0) + (column_coef.coefficient[38] * 0.0) + (column_coef.coefficient[39] * 1.0)
# add the sigmoid function to make the decision
# One way to make the loss function
def sigmoid(x):
return np.exp(x)/(1+np.exp(x))
print(sigmoid(weights_int_bias))
[0.99999414]
# according to wikipedia implementation: sigmoid function
1 / (1 + np.exp(-weights_int_bias))
array([0.99999414])
Get a single row of features and add it onto the model above and confirm if you get the same result as above.
clf_lr1.predict(x_test[:1]) # has chronic kidney disease for the 147th id
array([1])
Conclusion:¶
I think I was onto something because if you review the feature importance the hemoglobin, specific gravity(sg), sodium(sod), age were the most important features. Others that are related: al(albumin), sc(serum creatinine) which are medically relevant as well. I've done some tests with a uri strip to figure out if someone has an issue with their kidney and these are the parameters that point out to dysfunction of the kidney. Others like having diabetes mellitus though this is related to pancreatic beta cells issues, sodium levels taken up again by the kidney and age could also be indicators. The decision tree is also interesting but could use some tuning. The intercept says that on default the patient doesn't have chronic disease as well but the decision tree the start was hemoglobin. In future I will update the plots to show how the decision was made better. Otherwise, I'd discuss these results with a medical practitioner like a urologist. The explainable machine techniques further gives us confidence that the models have robust predictive power as we explain the results to the medical practitioner. The reader can try using a random forest classifier to see if the results are similar. What do you think?